Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[WIP] Implement project for Transform. #264 #269

Closed

Conversation

marvinlanhenke
Copy link
Contributor

@marvinlanhenke marvinlanhenke commented Mar 14, 2024

#264
I just wanted to share this as a starting point - and to get some feedback, thoughts about the approach here.

@liurenjie1024 @Xuanwo
Please take a look if this is a viable solution at all.
I'm quite unsure about the fn transform and the handling of the arrow_array - seems kinda clunky?

Anyway, let me know what you think. best regards.

@marvinlanhenke marvinlanhenke marked this pull request as draft March 14, 2024 19:50
@marvinlanhenke marvinlanhenke changed the title wip: prototype project_transform [WIP] Implement project for Transform. #264 Mar 14, 2024
@marvinlanhenke
Copy link
Contributor Author

@liurenjie1024 @Xuanwo
the main design concerns I want to discuss:

  1. exposing op and literals from expressions. For prototyping I did this directly on the expression - it might be better to implement this on BoundPredicate itself and expose op via pub(crate) from the struct?
  2. Once I receive a PrimitiveLiteral I need to convert it to an arrow_array in order to perform the transformation. The conversion to the arrow array could also be implemented on PrimitiveLiteral itself?

I would really appreciate your thoughts on this

PrimitiveLiteral::Long(v) => func
.transform(Arc::new(Int64Array::from_value(v, 1)))?
.as_any()
.downcast_ref::<Int32Array>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Int64Array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdd
if i understood correctly the fn transform implementation of Bucket always returns arrow_array::Int32Array.
https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/transform/bucket.rs#L89

@marvinlanhenke
Copy link
Contributor Author

@liurenjie1024 @sdd @Xuanwo @ZENOTME
I went on and implemented fn project for Transform::Bucket with some design assumptions. PTAL and tell me what you think, before we can move on and implement the missing transforms.
Thank you so much for your effort

@ZENOTME
Copy link
Contributor

ZENOTME commented Mar 18, 2024

Thanks for this job!

I'm quite unsure about the fn transform and the handling of the arrow_array - seems kinda clunky?

I think the transform can provide an interface like transform_literal later.

pub trait TransformFunction: Send {
    /// transform will take an input array and transform it into a new array.
    /// The implementation of this function will need to check and downcast the input to specific
    /// type.
    fn transform(&self, input: ArrayRef) -> Result<ArrayRef>;
+ fn transform_literal(&self, literal:Literal) -> Result<Literal>; 
}

self.op
}

pub(crate) fn literals(&self) -> &FnvHashSet<Datum> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(crate) fn literals(&self) -> &FnvHashSet<Datum> {
pub(crate) fn literals(&self) -> impl Iterator<Item=Datum> {

How about returning iterator only? It's better not expose the inner data structure.

@liurenjie1024
Copy link
Contributor

Hi, @marvinlanhenke This pr looks great to me. But I think @ZENOTME is right, we should implement #283 first. The reason we implement transform on arrow array could be found here

@ZENOTME
Copy link
Contributor

ZENOTME commented Mar 19, 2024

Hi, @marvinlanhenke This pr looks great to me. But I think @ZENOTME is right, we should implement #283 first. The reason we implement transform on arrow array could be found here

I will init the interface as soon as possible to avoid blocking this.

@marvinlanhenke
Copy link
Contributor Author

Hi, @marvinlanhenke This pr looks great to me. But I think @ZENOTME is right, we should implement #283 first. The reason we implement transform on arrow array could be found here

Thank you for providing the reason why we use arrow array in the first place.

@marvinlanhenke
Copy link
Contributor Author

Hi, @marvinlanhenke This pr looks great to me. But I think @ZENOTME is right, we should implement #283 first. The reason we implement transform on arrow array could be found here

I will init the interface as soon as possible to avoid blocking this.

Sounds good to me

@marvinlanhenke marvinlanhenke deleted the wip_project_transform branch April 23, 2024 04:28
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants