-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Epic: Move-Stable Row Ids #2307
Labels
epic
A collection of issues with a certain theme
Comments
This was referenced May 23, 2024
wjones127
added a commit
that referenced
this issue
May 31, 2024
* Adds stable row ids to manifest * Support writing row id sequences for append, overwrite, and update. Epic: #2307
This was referenced Jun 7, 2024
Open
wjones127
added a commit
that referenced
this issue
Jun 21, 2024
Part of #2307 * `Dataset::take_rows()` is now taking `row_ids`, which may now be stable row ids. These are translated into row addresses internally, and then use the existing logic. * `Fragment::take_rows()` now optionally returns row address column, if asked, instead of row id.
wjones127
added a commit
that referenced
this issue
Jun 26, 2024
Part of #2307 * Turns on unit tests to validate we can use ANN and scalar indices with move-stable row ids. * Changed pre-filter to support move-stable row ids * Major change is that the deletion mask is no longer always a block list. With address-style rod ids they are, but now with stable row ids they will instead be an allow list.
This was referenced Jun 28, 2024
Week of July 29Compaction and querying is waiting on review.
|
wjones127
added a commit
that referenced
this issue
Aug 7, 2024
Closing this, and continuing work in the #2454 epic. |
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Motivation
When we compaction data files, the row id changes. This causes us to need to update the index files whenever we compact. When the index files are updated, it invalidates them in the cache, degrading query performance. If row ids were stable when rows were moved, this would not happen.
Scope
This epic makes row ids stable after moving. It does not make them stable after updates. Rows that are updated will be deleted and appended under new ids.
A future epic will cover "primary keys", which will be the point at which row ids will be stable after updates in addition to moves. This is kept out of scope for now to keep the workload of this manageable.
Design
In very simple terms:
max_row_id
and as# similar process as fragment ids are assigned during the commit loop._rowid
.) In most cases, such as after an append, this will be a simple range of values(max_row_id + 1)..(physical_rows + max_row_id + 1)
.Deletion files will be superceded by tombstones contained in the row id index. This cuts down on total number of files to manage.Plan
The following tasks have been moved into the Primary Keys epic:
Week of August 12
The text was updated successfully, but these errors were encountered: