-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
ORC writer API changes for granular statistics #10058
ORC writer API changes for granular statistics #10058
Conversation
The erstwhile ORC writer API exposed only a binary choice to choose the level of statistics: ENABLED/DISABLED. This commit allows the ORC writer to further choose whether statistics are collected at the ROW_GROUP or STRIPE level.
…fea-rowgroup-stats
This branch includes the changes from #10041, since it's based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few small suggestions.
Kind of scary that tests did not require any changes. I'd like to add a test with ~1M rows and various data types that runs with different stat frequency. Would be even better to reuse an existing test, will look into this.
19e7208
to
493df76
Compare
(Please pardon the force push.)
This surfaced when attempting to add a non-default statistics frequency to |
Thanks, Vukasin!
rerun tests |
1 similar comment
rerun tests |
I'm working on resolving the merge conflict in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 🔥
Merging shortly. |
@gpucibot merge |
I've just merged this change. |
Depends on #10041.
The erstwhile ORC writer API exposed only a binary choice to choose
the level of statistics: ENABLED/DISABLED.
This commit allows the ORC writer to further choose whether statistics
are collected at the ROW_GROUP or STRIPE level.
This commit also includes the relevant changes to
java/
andpython/
.