-
Notifications
You must be signed in to change notification settings - Fork 52
Replace hash with faster and better finalized hash #37
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
[DO NOT MERGE] perf run for rustc-hash candidate See rust-lang/rustc-hash#37.
Note that should this get merged for optimal performance the |
[DO NOT MERGE] perf run for rustc-hash candidate See rust-lang/rustc-hash#37.
My recollection is that rustc doesn't hash a lot of strings so issue 1 didn't seem very impactful in practice, did your investigations show the opposite maybe? FWIW, a while back I also tried this same hash function for the integers, but with a different constant from the same paper and various finalization ops -- as such a sequence is simple enough that my rust lookalike of hash-prospector finds it very quickly. It wasn't an obvious improvement over the current hash at the time, and had less promising results than #18. (Edit: for comparison, here are #18's results, cycles here https://perf.rust-lang.org/compare.html?start=d53f1e8fbf891cf84fcb11eb078a27e528df795a&end=71812917089730aa5abc2f6c5978c6718c9b0f5f&stat=cycles%3Au) |
The performance run for this PR: https://perf.rust-lang.org/compare.html?start=0160bff4b1bffa241299aba8c8c63e7a3cd871fe&end=c358310bb0ab238e25e60f1e77f96e0a2b270d0b |
@lqd The integer hash is the more important one for sure, but there are also some string hashes in the top 99% of usage:
|
It would be good to squash all the commits together before merging this, because they're not logically separate changes. |
With the additional clarifications via comments and squashing I am happy with this. I don't normally give r+ to PRs in this repo so I will leave that to @Nilstrieb(?) |
@orlp And thanks for answering my questions and helping me get past my initial skepticism :) There have been multiple unsuccessful attempts by multiple people to improve on FxHash, some mentioned in this blog post, so it's nice to finally see success! |
Do you think it might be worth renaming |
|
It's just a whole lot of pain for little gain. I think my comment in the readme is fine (and also explains why its called
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thank you very much! I doubted you at RustNL but I will never doubt you again😊😊
This was implemented in rust-lang/rustc-hash#37 by Orson Peters and benchmarked in rust-lang#125133.
This was implemented in rust-lang/rustc-hash#37 by Orson Peters and benchmarked in rust-lang#125133.
…ogaloo, r=<try> Use new faster fxhash. This was implemented in rust-lang/rustc-hash#37 by Orson Peters and benchmarked in rust-lang#125133. r? `@ghost` `@bors` try `@rust-timer` queue
Update rustc-hash to version 2 This brings in the new algorithm. see rust-lang/rustc-hash#37 and rust-lang#125133
Update rustc-hash to version 2 This brings in the new optimized algorithm that was shown to have small performance benefits for rustc. I haven't run the rust-analyzer benchmarks. See rust-lang/rustc-hash#37.
Update rustc-hash to version 2 This brings in the new optimized algorithm that was shown to have small performance benefits for rustc. I haven't run the rust-analyzer benchmarks. See rust-lang/rustc-hash#37.
Update rustc-hash to version 2 This brings in the new optimized algorithm that was shown to have small performance benefits for rustc. I haven't run the rust-analyzer benchmarks. See rust-lang/rustc-hash#37.
Update rustc-hash to version 2 This brings in the new optimized algorithm that was shown to have small performance benefits for rustc. I haven't run the rust-analyzer benchmarks. See rust-lang/rustc-hash#37.
The current FxHasher design has a few issues:
i
do not affect bits less significant thani
. Since most hash table implementations (including hashbrown) compute their bucket indices from the least significant bits, this means everything can collide for a hash table of size 2^n if all differences are in bits n or above.With this PR I aim to fix 1 by adding a proper string hash, and mitigate 2 by adding a bit rotation as a finalizer. This bit rotation moves the high-entropy high bits into the low bits expected by hash table implementations. I also simplified the hasher to a polynomial hash, which further improved compile times and makes the hasher less of an arbitrary mish-mash of operations and more a combination of justifiable and understandable components.
I will make a dummy PR on rust-lang to do a perf run for this PR.