Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

perf: increase min buckets on very small types #524

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

morrisonlevi
Copy link

@morrisonlevi morrisonlevi commented May 7, 2024

Consider HashSet<u8> on x86_64 with SSE with various bucket sizes and how many bytes the allocation ends up being:

buckets capacity allocated bytes
4 3 36
8 7 40
16 14 48
28 32 80

In general, doubling the number of buckets should roughly double the number of bytes used. However, for small bucket sizes for these small TableLayouts (4 -> 8, 8 -> 16), it doesn't happen. This is an edge case which happens because of padding of the control bytes and adding the Group::WIDTH. Taking the buckets from 4 to 16 (4x) only takes the allocated bytes from 36 to 48 (~1.3x).

This platform isn't the only one with edges. Here's aarch64 on an M1 for the same HashSet<u8>:

buckets capacity allocated bytes
4 3 20
8 7 24
16 14 40

Notice doubling from 4 to 8 buckets only lead to 4 more bytes (20 -> 24) instead of roughly doubling.

Generalized, buckets * table_layout.size needs to be at least as big as table_layout.ctrl_align. For the cases I listed above, we'd get these new minimum bucket sizes:

  • x86_64 with SSE: 16
  • aarch64: 8

This is a niche optimization. However, it also removes possible undefined behavior edge case in resize operations. In addition, it would be a useful property when utilizing over-sized allocations (see #523).

@Amanieu
Copy link
Member

Amanieu commented Jun 17, 2024

Also see #47 for previous discussion of this problem.

Consider `HashSet<u8>` on x86_64 with SSE with various bucket sizes and
how many bytes the allocation ends up being:

| buckets | capacity | allocated bytes |
| ------- | -------- | --------------- |
|       4 |        3 |              36 |
|       8 |        7 |              40 |
|      16 |       14 |              48 |
|      32 |       28 |              80 |

In general, doubling the number of buckets should roughly double the
number of bytes used. However, for small bucket sizes for these small
TableLayouts (4 -> 8, 8 -> 16), it doesn't happen. This is an edge case
which happens because of padding of the control bytes and adding the
Group::WIDTH. Taking the buckets from 4 to 16 (4x) only takes the
allocated bytes from 36 to 48 (~1.3x).

This platform isn't the only one with edges. Here's aarch64 on an M1
for the same `HashSet<u8>`:

| buckets | capacity | allocated bytes |
| ------- | -------- | --------------- |
|       4 |        3 |              20 |
|       8 |        7 |              24 |
|      16 |       14 |              40 |

Notice 4 -> 8 buckets leading to only 4 more bytes (20 -> 24) instead
of roughly doubling.

Generalized, `buckets * table_layout.size` needs to be at least as big
as `table_layout.ctrl_align`. For the cases I listed above, we'd get
these new minimum bucket sizes:

 - x86_64 with SSE: 16
 - aarch64: 8

This is a niche optimization. However, it also removes possible
undefined behavior edge case in resize operations. In addition, it
may be a useful property to utilize over-sized allocations (see
rust-lang#523).
Comment on lines +216 to +221
let cap = cap.max(match (Group::WIDTH, table_layout.size) {
(16, 0..=1) => 14,
(16, 2..=3) => 7,
(8, 0..=1) => 7,
_ => 3,
});
Copy link
Author

@morrisonlevi morrisonlevi Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be written this way. Not sure what is better:

Suggested change
let cap = cap.max(match (Group::WIDTH, table_layout.size) {
(16, 0..=1) => 14,
(16, 2..=3) => 7,
(8, 0..=1) => 7,
_ => 3,
});
let cap = cap.max(match table_layout.size {
0..=1 if Group::WIDTH == 16 => 14,
2..=3 if Group::WIDTH == 16 => 7,
0..=1 if Group::WIDTH == 8 => 7,
_ => 3,
});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I had in mind was to move the max check inside the if cap < 8 block entirely (and changing the bound to 15). Then just have hard-coded selection depending on the requested capacity and the group width/item size.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants