Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: allow compacting files into new format version #2749

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

chebbyChefNEQ
Copy link
Contributor

@chebbyChefNEQ chebbyChefNEQ commented Aug 18, 2024

bootleg parallel migration tool using compaction task execution facility

In next PR, I'll add a force_rewrite option to rewrite files even when the file size is equal to the desired number of rows/size

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 79.15%. Comparing base (7284521) to head (430bd81).

Files Patch % Lines
rust/lance/src/dataset/optimize.rs 25.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2749      +/-   ##
==========================================
+ Coverage   79.13%   79.15%   +0.01%     
==========================================
  Files         227      227              
  Lines       67398    67402       +4     
  Branches    67398    67402       +4     
==========================================
+ Hits        53338    53353      +15     
+ Misses      10956    10948       -8     
+ Partials     3104     3101       -3     
Flag Coverage Δ
unittests 79.15% <25.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to avoid datasets that are split between two different versions (e.g. some files in v1 and some in v2). I think this approach can cause that situation if only some files need compacted? That could be potentially dangerous (although I think the write would fail in this situation)

Comment on lines +147 to +149
/// Whether to force generate the latest Stable format files version
/// for new fragments that are writter during compaction.
pub force_migrate_legacy_format: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What not just data_storage_version?

@chebbyChefNEQ
Copy link
Contributor Author

We want to avoid datasets that are split between two different versions (e.g. some files in v1 and some in v2). I think this approach can cause that situation if only some files need compacted? That could be potentially dangerous (although I think the write would fail in this situation)

I see. Let me make this option implicitly rewrite the whole dataset then.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants