-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
perf: increase min buckets on very small types #524
base: master
Are you sure you want to change the base?
Conversation
192f281
to
f637220
Compare
Also see #47 for previous discussion of this problem. |
Consider `HashSet<u8>` on x86_64 with SSE with various bucket sizes and how many bytes the allocation ends up being: | buckets | capacity | allocated bytes | | ------- | -------- | --------------- | | 4 | 3 | 36 | | 8 | 7 | 40 | | 16 | 14 | 48 | | 32 | 28 | 80 | In general, doubling the number of buckets should roughly double the number of bytes used. However, for small bucket sizes for these small TableLayouts (4 -> 8, 8 -> 16), it doesn't happen. This is an edge case which happens because of padding of the control bytes and adding the Group::WIDTH. Taking the buckets from 4 to 16 (4x) only takes the allocated bytes from 36 to 48 (~1.3x). This platform isn't the only one with edges. Here's aarch64 on an M1 for the same `HashSet<u8>`: | buckets | capacity | allocated bytes | | ------- | -------- | --------------- | | 4 | 3 | 20 | | 8 | 7 | 24 | | 16 | 14 | 40 | Notice 4 -> 8 buckets leading to only 4 more bytes (20 -> 24) instead of roughly doubling. Generalized, `buckets * table_layout.size` needs to be at least as big as `table_layout.ctrl_align`. For the cases I listed above, we'd get these new minimum bucket sizes: - x86_64 with SSE: 16 - aarch64: 8 This is a niche optimization. However, it also removes possible undefined behavior edge case in resize operations. In addition, it may be a useful property to utilize over-sized allocations (see rust-lang#523).
f637220
to
ef4df57
Compare
d00fcaa
to
c45e080
Compare
let cap = cap.max(match (Group::WIDTH, table_layout.size) { | ||
(16, 0..=1) => 14, | ||
(16, 2..=3) => 7, | ||
(8, 0..=1) => 7, | ||
_ => 3, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could also be written this way. Not sure what is better:
let cap = cap.max(match (Group::WIDTH, table_layout.size) { | |
(16, 0..=1) => 14, | |
(16, 2..=3) => 7, | |
(8, 0..=1) => 7, | |
_ => 3, | |
}); | |
let cap = cap.max(match table_layout.size { | |
0..=1 if Group::WIDTH == 16 => 14, | |
2..=3 if Group::WIDTH == 16 => 7, | |
0..=1 if Group::WIDTH == 8 => 7, | |
_ => 3, | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I had in mind was to move the max
check inside the if cap < 8
block entirely (and changing the bound to 15). Then just have hard-coded selection depending on the requested capacity and the group width/item size.
Consider
HashSet<u8>
on x86_64 with SSE with various bucket sizes and how many bytes the allocation ends up being:In general, doubling the number of buckets should roughly double the number of bytes used. However, for small bucket sizes for these small
TableLayout
s (4 -> 8, 8 -> 16), it doesn't happen. This is an edge case which happens because of padding of the control bytes and adding theGroup::WIDTH
. Taking the buckets from 4 to 16 (4x) only takes the allocated bytes from 36 to 48 (~1.3x).This platform isn't the only one with edges. Here's aarch64 on an M1 for the same
HashSet<u8>
:Notice doubling from 4 to 8 buckets only lead to 4 more bytes (20 -> 24) instead of roughly doubling.
Generalized,
buckets * table_layout.size
needs to be at least as big astable_layout.ctrl_align
. For the cases I listed above, we'd get these new minimum bucket sizes:This is a niche optimization. However, it also removes possible undefined behavior edge case in resize operations. In addition, it would be a useful property when utilizing over-sized allocations (see #523).