Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: Add support for UnboundPartitionSpec. #98

Closed
liurenjie1024 opened this issue Nov 17, 2023 · 9 comments · Fixed by #106
Closed

feat: Add support for UnboundPartitionSpec. #98

liurenjie1024 opened this issue Nov 17, 2023 · 9 comments · Fixed by #106
Labels
good first issue Good for newcomers

Comments

@liurenjie1024
Copy link
Contributor

We can see java implementation as reference.

@liurenjie1024 liurenjie1024 added the good first issue Good for newcomers label Nov 17, 2023
@my-vegetable-has-exploded
Copy link
Contributor

I'd like to have a try

@liurenjie1024
Copy link
Contributor Author

I'd like to have a try

Thanks!

@my-vegetable-has-exploded
Copy link
Contributor

Sorry, I am confusing about this issue. According to iceberg#4360, I think UnboundPartitionSpec provides a build method without a schema and can be later bound to a schema. It seems that the primary action in the bind function involves retrieving the source type from the schema and initializing the transform with the source type.
But in iceberg-rust, our transformFunction don't need source type to build. We determine source type using arrow_array.data_type() likes TransformFunction for Bucket. Do you think we still need this feature?

@liurenjie1024
Copy link
Contributor Author

Hi, @my-vegetable-has-exploded Sorry for the confusion. UnboundPartitionSpec has two important use cases:

  1. In rest api's definition. In this definition, the specId of PartitionSpec, field_id of PartitionFieldSpec are all optional in request(they are readonly.
  2. In TableMetadata builder api. It makes sense that you don't need to specify spec_id, field_id before actual committing transaction, since ids should be managed by catalogs.

Though we can use some other techniques, for example make them Option or use some default value (-1) to simplify it, but I prefer an unbound struct is more clear and type safe.

As with Transform, I think this is not important due to the api difference of java and rust.

@my-vegetable-has-exploded
Copy link
Contributor

Sorry for my misunderstanding and thanks for your patience @liurenjie1024. Without option, UnboundPartitionSpec and UnboundPartitionField may like this?

pub struct UnboundPartitionField {
    /// A source column id from the table’s schema
    pub source_id: i32,
    /// A partition name.
    pub name: String,
    /// A transform that is applied to the source column to produce a partition value.
    pub transform: Transform,
}

pub struct UnboundPartitionField {
    /// A source column id from the table’s schema
    pub source_id: i32,
    /// A partition name.
    pub name: String,
    /// A transform that is applied to the source column to produce a partition value.
    pub transform: Transform,
}

pub struct UnboundPartitionSpec {
    pub fields: Vec<UnboundPartitionField>,
}

impl UnboundPartitionSpec{
         pub fn bind(&self, schema: SchemaRef) -> Result<PartitionSpec> {
         // progress
         }
}

@liurenjie1024
Copy link
Contributor Author

Hi, @my-vegetable-has-exploded I mean I don't want to add Option in current PartitionSpec, but it's ok to add it in UnboundPartitonSpec.

@my-vegetable-has-exploded
Copy link
Contributor

Hi, @my-vegetable-has-exploded I mean I don't want to add Option in current PartitionSpec, but it's ok to add it in UnboundPartitonSpec.

get it!, thanks.

@my-vegetable-has-exploded
Copy link
Contributor

I think I'm still misunderstanding the UnboundPartitionSpec binding process, especially determining the spec_id during committing transaction.
I opened a draft pr,please take a look and leave your comments when you are free!
thanks, @liurenjie1024

@liurenjie1024
Copy link
Contributor Author

I think I'm still misunderstanding the UnboundPartitionSpec binding process, especially determining the spec_id during committing transaction. I opened a draft pr,please take a look and leave your comments when you are free! thanks, @liurenjie1024

Cool, I'll take a look.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants