-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: Support writing to not only 3/7 protocol #693
feat: Support writing to not only 3/7 protocol #693
Conversation
@@ -139,136 +149,174 @@ fn new_commit_info() -> DeltaResult<Box<ArrowEngineData>> { | |||
Ok(Box::new(ArrowEngineData::new(commit_info_batch))) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there are a lot of changes in this file, but mainly it's just:
- Allowing
setup
to create a 3/7 or 1/1 table - adding a new function to generate both types of tables and associated store/engine/location
- making each test iterate over those tables instead of just one
"operationParameters": {}, | ||
"engineCommitInfo": { | ||
"engineInfo": "default engine" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for basically every test I just:
- removed the line like:
let table = create_table(store.clone(), table_location, schema, &[]).await?;
and replaced it withfor (table, engine, store, table_name) in setup_tables(schema, &[]).await? {
- updated any hard coded paths to rather use
table_name
It looks like more changes, but if you hide whitespace you'll see it's minimal.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #693 +/- ##
==========================================
- Coverage 84.56% 84.54% -0.02%
==========================================
Files 75 75
Lines 17550 17553 +3
Branches 17550 17553 +3
==========================================
Hits 14841 14841
- Misses 2000 2003 +3
Partials 709 709 ☔ View full report in Codecov by Sentry. |
"Tables with min writer version != 7 should not have table features.", | ||
)) | ||
} | ||
None => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need the default case (but with a different error message).
Every writer feature > 1 introduces new capabilities kernel must support (cumulative), with no way to know whether the table actually uses those features. See the full list here.
Versions 2, 3, and 4 are especially problematic because they respectively require column invariants, check constraints, and generated cols -- none of which we could implement because they are underspecified and their implementation is highly spark-specific.
Writer version 6 is also problematic due to identity cols. We maybe could implement support for those, but it would be an annoying time sink -- even blind appends are affected.
So basically, we can only support writer versions 1 and 7.
Meanwhile, we can actually support all three reader versions, because 2 just requires column mapping which we already support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to partly work around this because some features have an obvious effect on the table's Metadata when they're active. We can probably get away with allowing the protocol version, as long as we can prove that the table doesn't use any unsupported features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this does cover all cases. Note it's checking what writerFeatures
is set to. So in the top two cases there is a writerFeatures
in the protocol. If we're not on version 7, that's an error, if we are on 7, we validate the features as before.
The None
case is then interesting. We could actually be on any version:
- 7 is invalid because then there should be writer features, but there aren't.
- 1 is valid, and we can write to it
- all the rest are currently invalid.
I've updated the code to enforce that, but I got the sense DBR might go to (1,2) sometimes, so perhaps we should check and if it's 2 we can look in the metadata configuration and make sure it doesn't contain delta.invariants
before continuing.
thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree the cases are covered correctly now.
Re (1, 2) -- agree we need probably need to do that at some point. However, in theory the table only has (1, 2) because it is using invariants; otherwise why did it the table need to upgrade in the first place?
Good candidate for a follow-on PR tho.
kernel/src/actions/mod.rs
Outdated
require!( | ||
self.min_writer_version != 7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the first match arm guarantee this assertion can never fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This is the case where there was no writerFeatures
specified in the protocol. So I've switched this to be a check that we're on version 1 only (but maybe we need 2, see other comment)
let (store_37, engine_37, table_location_37) = setup("test_table_37", true); | ||
let (store_11, engine_11, table_location_11) = setup("test_table_11", true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do the different tables need different engines, out of curiosity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pass the storage to the engine, so because we want disjoint storage, we need to have two engines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM one nit!
kernel/src/actions/mod.rs
Outdated
require!( | ||
self.min_writer_version == 1, | ||
Error::unsupported( | ||
"Currently delta-kernel-rs can only write to tables with min_writer_version = 1 or 7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just to match spec (or maybe do delta.minWriterVersion
to match the user-facing table property?)
"Currently delta-kernel-rs can only write to tables with min_writer_version = 1 or 7" | |
"Currently delta-kernel-rs can only write to tables with protocol.minWriterVersion = 1 or 7" |
What changes are proposed in this pull request?
Our previous check was too strict. Now we just ensure that the protocol makes sense given what features are present/specified.
How was this change tested?
Made all existing
write.rs
tests also write to a protocol 1/1 table, and they all work.