Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: Support writing to not only 3/7 protocol #693

Merged
merged 5 commits into from
Feb 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 18 additions & 9 deletions kernel/src/actions/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -276,17 +276,26 @@ impl Protocol {
/// support the specified protocol writer version and all enabled writer features?
pub fn ensure_write_supported(&self) -> DeltaResult<()> {
match &self.writer_features {
// if min_reader_version = 3 and min_writer_version = 7 and all writer features are
// supported => OK
Some(writer_features)
if self.min_reader_version == 3 && self.min_writer_version == 7 =>
{
Some(writer_features) if self.min_writer_version == 7 => {
// if we're on version 7, make sure we support all the specified features
ensure_supported_features(writer_features, &SUPPORTED_WRITER_FEATURES)
}
// otherwise not supported
_ => Err(Error::unsupported(
"Only tables with min reader version 3 and min writer version 7 with no table features are supported."
)),
Some(_) => {
// there are features, but we're not on 7, so the protocol is actually broken
Err(Error::unsupported(
"Tables with min writer version != 7 should not have table features.",
))
}
None => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need the default case (but with a different error message).

Every writer feature > 1 introduces new capabilities kernel must support (cumulative), with no way to know whether the table actually uses those features. See the full list here.

Versions 2, 3, and 4 are especially problematic because they respectively require column invariants, check constraints, and generated cols -- none of which we could implement because they are underspecified and their implementation is highly spark-specific.

Writer version 6 is also problematic due to identity cols. We maybe could implement support for those, but it would be an annoying time sink -- even blind appends are affected.

So basically, we can only support writer versions 1 and 7.

Meanwhile, we can actually support all three reader versions, because 2 just requires column mapping which we already support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to partly work around this because some features have an obvious effect on the table's Metadata when they're active. We can probably get away with allowing the protocol version, as long as we can prove that the table doesn't use any unsupported features?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this does cover all cases. Note it's checking what writerFeatures is set to. So in the top two cases there is a writerFeatures in the protocol. If we're not on version 7, that's an error, if we are on 7, we validate the features as before.

The None case is then interesting. We could actually be on any version:

  • 7 is invalid because then there should be writer features, but there aren't.
  • 1 is valid, and we can write to it
  • all the rest are currently invalid.

I've updated the code to enforce that, but I got the sense DBR might go to (1,2) sometimes, so perhaps we should check and if it's 2 we can look in the metadata configuration and make sure it doesn't contain delta.invariants before continuing.

thoughts?

Copy link
Collaborator

@scovich scovich Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree the cases are covered correctly now.

Re (1, 2) -- agree we need probably need to do that at some point. However, in theory the table only has (1, 2) because it is using invariants; otherwise why did it the table need to upgrade in the first place?

Good candidate for a follow-on PR tho.

// no features, we currently only support version 1 in this case
require!(
self.min_writer_version == 1,
Error::unsupported(
"Currently delta-kernel-rs can only write to tables with protocol.minWriterVersion = 1 or 7"
)
);
Ok(())
}
}
}
}
Expand Down
Loading
Loading