Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Epic: Move-Stable Row Ids #2307

Closed
8 of 13 tasks
Tracked by #2079
wjones127 opened this issue May 6, 2024 · 2 comments
Closed
8 of 13 tasks
Tracked by #2079

Epic: Move-Stable Row Ids #2307

wjones127 opened this issue May 6, 2024 · 2 comments
Assignees
Labels
epic A collection of issues with a certain theme

Comments

@wjones127
Copy link
Contributor

wjones127 commented May 6, 2024

Motivation

When we compaction data files, the row id changes. This causes us to need to update the index files whenever we compact. When the index files are updated, it invalidates them in the cache, degrading query performance. If row ids were stable when rows were moved, this would not happen.

Scope

This epic makes row ids stable after moving. It does not make them stable after updates. Rows that are updated will be deleted and appended under new ids.

A future epic will cover "primary keys", which will be the point at which row ids will be stable after updates in addition to moves. This is kept out of scope for now to keep the workload of this manageable.

Design

In very simple terms:

  1. Add row ids as auto-incrementing u64 id. The manifest will track max_row_id and as# similar process as fragment ids are assigned during the commit loop.
  2. Each fragment metadata will contain a small row id index. This index maps from row id to row address. (Row address is what we currently call _rowid.) In most cases, such as after an append, this will be a simple range of values (max_row_id + 1)..(physical_rows + max_row_id + 1).
  3. Deletion files will be superceded by tombstones contained in the row id index. This cuts down on total number of files to manage.
  4. A new feature flag will be introduced to make sure older readers don't try to interpret these new row ids.

Plan

The following tasks have been moved into the Primary Keys epic:

  • Follow ups for stabilization
    • Replace custom bitmap implementation
    • Finalize serialization format
    • Optimize row id access given real benchmarks
  • External files and cleanup
    • Write out external files if large enough
    • Cleanup implementation

Week of August 12

@wjones127 wjones127 added the epic A collection of issues with a certain theme label May 6, 2024
@wjones127 wjones127 self-assigned this May 6, 2024
wjones127 added a commit that referenced this issue May 31, 2024
* Adds stable row ids to manifest
* Support writing row id sequences for append, overwrite, and update.

Epic: #2307
@wjones127 wjones127 changed the title Epic: Stable Row Ids Epic: Move-Stable Row Ids Jun 7, 2024
wjones127 added a commit that referenced this issue Jun 21, 2024
Part of #2307

* `Dataset::take_rows()` is now taking `row_ids`, which may now be
stable row ids. These are translated into row addresses internally, and
then use the existing logic.
* `Fragment::take_rows()` now optionally returns row address column, if
asked, instead of row id.
wjones127 added a commit that referenced this issue Jun 26, 2024
Part of #2307

* Turns on unit tests to validate we can use ANN and scalar indices with
move-stable row ids.
* Changed pre-filter to support move-stable row ids
* Major change is that the deletion mask is no longer always a block
list. With address-style rod ids they are, but now with stable row ids
they will instead be an allow list.
@wjones127
Copy link
Contributor Author

wjones127 commented Aug 2, 2024

Week of July 29

Compaction and querying is waiting on review.

wjones127 added a commit that referenced this issue Aug 7, 2024
@wjones127
Copy link
Contributor Author

Closing this, and continuing work in the #2454 epic.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
epic A collection of issues with a certain theme
Projects
None yet
Development

No branches or pull requests

1 participant