Merge scalar index pages when all values are the same #1551

westonpace · 2023-11-07T23:19:21Z

Right now, when training a scalar btree index, we always divide the input up into evenly sized pages. For example, given an input of [1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 10] and a page size of 3 we would create pages:

[1, 2, 2]
[2, 2, 2]
[2, 2, 2]
[2, 5, 10]

There is no advantage to having multiple pages that are all the same value. Ideally we would create pages like:

[1]
[2, 2, 2, 2, 2, 2, 2, 2, 2]
[5, 10]

This way, if the filter is x = 2 then we only need to read one page instead of reading three pages. It would also simplify the logic in the btree index lookup.

This may seem like an unlikely occurrence but it is very common when we have low cardinality columns since we sort those columns first. In other words, if we have a column that only has 5 choices, we should only have five pages, even if we have 100 million rows.

The text was updated successfully, but these errors were encountered:

This was referenced Nov 7, 2023

Store a compressed bitmap alongside the index pages #1552

Open

[EPIC]: Scalar index follow-up issues tracker #1553

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge scalar index pages when all values are the same #1551

Merge scalar index pages when all values are the same #1551

westonpace commented Nov 7, 2023

Merge scalar index pages when all values are the same #1551

Merge scalar index pages when all values are the same #1551

Comments

westonpace commented Nov 7, 2023