Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merge scalar index pages when all values are the same #1551

Open
Tracked by #1553
westonpace opened this issue Nov 7, 2023 · 0 comments
Open
Tracked by #1553

Merge scalar index pages when all values are the same #1551

westonpace opened this issue Nov 7, 2023 · 0 comments

Comments

@westonpace
Copy link
Contributor

Right now, when training a scalar btree index, we always divide the input up into evenly sized pages. For example, given an input of [1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 10] and a page size of 3 we would create pages:

[1, 2, 2]
[2, 2, 2]
[2, 2, 2]
[2, 5, 10]

There is no advantage to having multiple pages that are all the same value. Ideally we would create pages like:

[1]
[2, 2, 2, 2, 2, 2, 2, 2, 2]
[5, 10]

This way, if the filter is x = 2 then we only need to read one page instead of reading three pages. It would also simplify the logic in the btree index lookup.

This may seem like an unlikely occurrence but it is very common when we have low cardinality columns since we sort those columns first. In other words, if we have a column that only has 5 choices, we should only have five pages, even if we have 100 million rows.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant