-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fix: fix parse partitions in manifest_list #122
Conversation
crates/iceberg/src/spec/manifest.rs
Outdated
@@ -79,6 +79,22 @@ impl Manifest { | |||
|
|||
Ok(Manifest { metadata, entries }) | |||
} | |||
|
|||
/// Upgrade the format version of this manifest. | |||
pub fn upgrade_version(&mut self, format_version: FormatVersion) -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update version seems tricky to me. We don't need to take other actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this concern. Version upgrade is part of transaction api, it involves full rewrite of all things like snapshot, and should not do it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this concern. Version upgrade is part of transaction api, it involves full rewrite of all things like snapshot, and should not do it here.
Yes, upgrading the version is more. Here it's more like upgrading this manifest file. #119 (comment)
This interface is confusing. There is another way:
pub fn parse_avro(bs: &[u8], version) -> Result<Self, Error>
. Then we can parse v1 manifest as v2. After we parse it, we will modify the version as v2.pub async fn write(mut self, manifest: Manifest, version)
. And then we can write as v1 manifest as v2. I think this way may be more clear.🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that comment, but I don't think we need to introduce this api for now. The manifest format upgradation is part of table format upgradation, and we need to think about it in the overall design, rather exposing this single api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's remove this interface and only fix the init-default. Let's track the upgrade action in another issue as a new feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this concern. Version upgrade is part of transaction api, it involves full rewrite of all things like snapshot, and should not do it here.
This is not correct, and V2 is forward-compatible. If you apply the rules of schema evolution (as with any other Iceberg table), it should work seamlessly. You don't want to rewrite a multi-petabyte table when you do any meta operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this interface and only fix the init-default.
This will make thing easier, such as setting the sequence-number when the field isn't present (in the case of V1 metadata).
.unwrap_or_default()) | ||
} else { | ||
Err(Error::new( | ||
crate::ErrorKind::DataInvalid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm expecting a better error message with more context, for example, the manifest list file path, the entry path, spec id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added context of entry path and spec id. We don't have the manifest list file path for now (we pass [u8] to parse directly), I think it need to set in the higher level API.
crates/iceberg/src/spec/manifest.rs
Outdated
@@ -79,6 +79,22 @@ impl Manifest { | |||
|
|||
Ok(Manifest { metadata, entries }) | |||
} | |||
|
|||
/// Upgrade the format version of this manifest. | |||
pub fn upgrade_version(&mut self, format_version: FormatVersion) -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this concern. Version upgrade is part of transaction api, it involves full rewrite of all things like snapshot, and should not do it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -1104,8 +1104,8 @@ mod _serde { | |||
Ok(ManifestEntry { | |||
status: self.status.try_into()?, | |||
snapshot_id: Some(self.snapshot_id), | |||
sequence_number: None, | |||
file_sequence_number: None, | |||
sequence_number: Some(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @ZENOTME
This PR:
partition_type
. #121 and modify unit test to test this case.initial-default
when reading Avro #119, add init default when reading v1 manifestupgrade_version
and then write it back as v2.