-
Notifications
You must be signed in to change notification settings - Fork 13.4k
hashmap: Store hashes as usize internally #36595
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
r? @brson (rust_highfive has picked a reviewer for you, use r? to override) |
Needs benchmarks and testing on 32-bit. |
-- updated benchmarks bellow |
@bluss we discussed this briefly at libs triage today and were curious, do you have some representative numbers as well about this change? (or @arthurprs do you have some?) Would be useful to see the comparisons! I was personally a little worried about losing 32 bits of a 64-bit hash on 32-bit platforms, but so long as we generally recommend a uniform distribution of bits I think it'll work out. |
// We need to avoid 0 in order to prevent collisions with | ||
// EMPTY_HASH. We can maintain our precious uniform distribution | ||
// of initial indexes by unconditionally setting the MSB, | ||
// effectively reducing 64-bits hashes to 63 bits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment needs updating because of the 32 -> 31?
Here are some x86 benchmarks. Keep in mind that most of these are What I don't get is why in the hhkkvv layout the iteration becomes slower, I'll try to dig something from the assembly. -- benchmarks remove look for updates bellow |
I wanted to know how it influences bootstrap time on both 64-bit and 32-bit, it would be a good metric. I don't have the resources to do the comparison, so I left this here to ask if someone wanted to try it. Ah, the benchmarks are for kkvv layout (current) and kvkv (your pr) |
I looked at the disassembly but I can't see anything that would make iteration that much slower. From an algorithm POV it makes no sense to me. |
@arthurprs those numbers seem to be mostly related to performance, but isn't the theory behind this change that it mostly saves memory? |
@alexcrichton sure, I just happened to have these around due to the other PR. And the decrease in cache pressure shows up in these. As for memory benefits it's easy to calculate as a function of sizeof(K) and sizeof(V) as it doesn't depend on other factors like padding etc.
|
@alexcrichton Smaller data means better performance; more hashes fit into a cache line during lookup. So performance was my main thought. |
Dealing with so many benchmarks is tricky. I found a problem so I'm repeating the benchmarks, will post fixed and expanded updates soon. |
Here's the updated results. And the gist with even more comparisons and better visualization https://gist.github.com/arthurprs/9f28847dceee86bd5cfffcd30d9cd6cc Code: https://github.com/arthurprs/hashmap2/tree/layout and https://github.com/arthurprs/hashmap2/tree/layout_usize
|
@arthurprs hm some of those benchmarks seem worrisome indicating that iteration gets 40% slower? Is that just an outlier though? |
@alexcrichton yeah, it's super odd (and makes no sense to me) but I can reproduce it every time here. I was hopping somebody else could run the benchmarks. It's just a matter of checking out these branches (https://github.com/arthurprs/hashmap2/tree/layout and https://github.com/arthurprs/hashmap2/tree/layout_usize) and running something like |
@rfcbot fcp merge Another neat improvement to our hash tables, and seems like the numbers back it up? |
☔ The latest upstream changes (presumably #36753) made this pull request unmergeable. Please resolve the merge conflicts. |
rfcbot stuck? |
I'm waiting for @arthurprs's hashmap change to merge first before updating. I think the benchmarks said this was a universal win for that memory layout(?), so it should be a simple decision then. |
Ah yeah we've discussed this as well, so it's ok to r+ when ready to go. |
4283919
to
6707668
Compare
We can't use more than usize's bits of a hash to select a bucket anyway, so we only need to store that part in the table. This should be an improvement for the size of the data structure on 32-bit platforms. Smaller data means better cache utilization and hopefully better performance.
6707668
to
13a1f21
Compare
Measuring with compare.py from rustc-benchmarks would be useful here. |
@bluss this is ready to go now, right? |
It is ready to go. I haven't invested more time in benchmarking this, and that wasn't the plan from the start either; I wanted to simply implement this and give to rustc developers if they were interested. @arthurprs's benchmarks show surprisingly noticeable gains, so I'm happy. |
@bors: r+ Ok, thanks! |
📌 Commit 13a1f21 has been approved by |
hashmap: Store hashes as usize internally We can't use more than usize's bits of a hash to select a bucket anyway, so we only need to store that part in the table. This should be an improvement for the size of the data structure on 32-bit platforms. Smaller data means better cache utilization and hopefully better performance. Fixes #36567
The gains very likely come from smaller hash array memory footprint and cheaper displacement() calculation. The later is in a very hot code path. Edit: actually the displacement may not affected but the patch is probably saving 64bit arithmetic and freeing an extra register in other places. |
We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.
Fixes #36567