Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: Introduce schema definition. #19

Merged
merged 4 commits into from
Aug 3, 2023
Merged

Conversation

liurenjie1024
Copy link
Collaborator

Initial schema definition. SchemaVisitor, name indexes will come later.

crates/iceberg/src/error.rs Outdated Show resolved Hide resolved
}

/// Set the field's initial default value.
pub fn with_write_default(mut self, value: impl ToString) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_default and initial_default seems should be a value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but currently Value is not defined yet 🤪

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM! Although, I feel that the usage of once_cell here is a bit over-designed. Is it possible that we may need to add new fields into StructType, for example, in schema evaluation?

@liurenjie1024
Copy link
Collaborator Author

The Python and java implementation treats schema as immutable data structures, I think we should also follow that. Mutable makes things complicated, especially when we have indexes. For example the name indexing following.

@Xuanwo
Copy link
Member

Xuanwo commented Aug 2, 2023

The Python and java implementation treats schema as immutable data structures, I think we should also follow that. Mutable makes things complicated, especially when we have indexes. For example the name indexing following.

Got it, makes sense now.

@JanKaul
Copy link
Collaborator

JanKaul commented Aug 2, 2023

Should we also include identifier-field-ids?

@liurenjie1024
Copy link
Collaborator Author

Should we also include identifier-field-ids?

Just took a look at it. We need visitor pattern to verify them, for example the types, etc. So I want to postpone it after we introduce schema visitor.

@JanKaul
Copy link
Collaborator

JanKaul commented Aug 3, 2023

I don't entirely understand. Could you elaborate why we would need the visitor pattern for the identifier-field-ids? I was thinking about something similar to the serialized representation.

@liurenjie1024
Copy link
Collaborator Author

liurenjie1024 commented Aug 3, 2023

I don't entirely understand. Could you elaborate why we would need the visitor pattern for the identifier-field-ids? I was thinking about something similar to the serialized representation.

I mean the verification check, you can find the implementation in java: https://github.com/apache/iceberg/blob/1bb853191fd378fb1dfda5a5cb297475b7fc204b/api/src/main/java/org/apache/iceberg/Schema.java#L104

cc @Fokko @JanKaul Why I don't include identifier-field-ids here.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, but looks good 👍🏻

crates/iceberg/src/spec/datatypes.rs Show resolved Hide resolved
crates/iceberg/src/spec/schema.rs Show resolved Hide resolved
r#struct: StructType,
schema_id: i32,
highest_field_id: i32,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the identifier-field-ids are missing: https://iceberg.apache.org/spec/#identifier-field-ids

@Fokko Fokko merged commit 67f394e into apache:main Aug 3, 2023
7 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants