Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Store a compressed bitmap alongside the index pages #1552

Open
Tracked by #1553
westonpace opened this issue Nov 7, 2023 · 3 comments
Open
Tracked by #1553

Store a compressed bitmap alongside the index pages #1552

westonpace opened this issue Nov 7, 2023 · 3 comments

Comments

@westonpace
Copy link
Contributor

When we match a page in the btree search we then delegate to the sub-index to search that page. This works well for high cardinality data and equality / inequality searches. However, it can be inefficient when the data is low cardinality since we have to search a very large page (or many small pages)

Luckily, when we search the btree, we can know if a page is entirely included in the result. In other words, if a page has min: 5, max: 10 and the query is value < 20 then we don't need to search the page because we know we want all of the rows in the page. If we had stored a compressed bitmap of all the row ids in the page then all we would need to do would be to load that.

@westonpace
Copy link
Contributor Author

This issue has high synergy with #1551 . For example, consider a low cardinality column with only 5 choices and 100 million rows. The query is then something like color == 'blue'. Satisfying this query today is pretty expensive. We need to search all of the tiny pages that match that query.

On the other hand, if we only had five pages, and each page had their own compressed bitmap, then we could satisfy the query very quickly with a single IOP to read the compressed bitmap for the correct page.

This is basically giving us the benefits of both btree indices AND bitmap indices in one combined structure.

@westonpace
Copy link
Contributor Author

Note that this is likely going to be related to #2307 as well.

@westonpace
Copy link
Contributor Author

And #1887 suggests we should probably just avoid this scenario in most situations.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant