You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, when training a scalar btree index, we always divide the input up into evenly sized pages. For example, given an input of [1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 10] and a page size of 3 we would create pages:
[1, 2, 2]
[2, 2, 2]
[2, 2, 2]
[2, 5, 10]
There is no advantage to having multiple pages that are all the same value. Ideally we would create pages like:
[1]
[2, 2, 2, 2, 2, 2, 2, 2, 2]
[5, 10]
This way, if the filter is x = 2 then we only need to read one page instead of reading three pages. It would also simplify the logic in the btree index lookup.
This may seem like an unlikely occurrence but it is very common when we have low cardinality columns since we sort those columns first. In other words, if we have a column that only has 5 choices, we should only have five pages, even if we have 100 million rows.
The text was updated successfully, but these errors were encountered:
Right now, when training a scalar btree index, we always divide the input up into evenly sized pages. For example, given an input of
[1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 10]
and a page size of 3 we would create pages:There is no advantage to having multiple pages that are all the same value. Ideally we would create pages like:
This way, if the filter is
x = 2
then we only need to read one page instead of reading three pages. It would also simplify the logic in the btree index lookup.This may seem like an unlikely occurrence but it is very common when we have low cardinality columns since we sort those columns first. In other words, if we have a column that only has 5 choices, we should only have five pages, even if we have 100 million rows.
The text was updated successfully, but these errors were encountered: