-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Drop aHash dependency in favor of hashbrown's choice of default hasher #17
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this makes sense, particularly with the outlook of parameterising the hasher itself.
src/intern.rs
Outdated
@@ -182,3 +182,12 @@ impl<T: Hash + Eq> Index<Token> for Interner<T> { | |||
&self.tokens[index.0 as usize] | |||
} | |||
} | |||
|
|||
// TODO: remove in favor of BuildHasher::hash_one once compilers older than 1.71 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with bunping msrv for this, I loosly follow firefox MSRV and that is on 1.76 at this point (up to you to you want to do that in this PR, if lot let me know and I will merge as-is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is simply a leftover from when ahash was the hashbrown default hasher (and fxhash suffers with strings which is the main usecase). Foldhash is a nice choice.
I think parameterizing by hasher is a good plan. I tend ot not be the biggest fan of it since it's quite viral in the codebase (and can't use new/with_capacity) so I prefer to have a good default choice (like hashbrown) but for downstream user to experiment/overwire it's definitly nice
For std::hash::BuildHasher::hash_one. https://doc.rust-lang.org/std/hash/trait.BuildHasher.html#method.hash_one
warning: unnecessary closure used with `bool::then` --> src/myers/middle_snake.rs:248:9 | 248 | (best_score > 0).then(|| (best_token_idx1, best_token_idx2)) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations = note: `#[warn(clippy::unnecessary_lazy_evaluations)]` on by default help: use `then_some` instead | 248 | (best_score > 0).then_some((best_token_idx1, best_token_idx2)) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warning: this `impl` can be derived --> src/lib.rs:233:1 | 233 | / impl Default for Algorithm { 234 | | fn default() -> Self { 235 | | Algorithm::Histogram 236 | | } 237 | | } | |_^ | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derivable_impls = note: `#[warn(clippy::derivable_impls)]` on by default help: replace the manual implementation with a derive attribute and mark the default variant | 175 + #[derive(Default)] 176 ~ pub enum Algorithm { 177 | /// A variation of the [`patience` diff algorithm described by Bram Cohen's blog post](https://bramcohen.livejournal.com/73318.html) ... 202 | /// be used instead. 203 ~ #[default] 204 ~ Histogram, |
The
ahash
crate is problematic (tkaitchuck/aHash#238, tkaitchuck/aHash#239, tkaitchuck/aHash#242) and not well maintained.Since
Interner<T>
has to be able to hash arbitrary token types (everything ranging from long lines of code to individual codepoints, which have pretty different hashing performance characteristics) it is unlikely that any hasher choice made by this crate would be broadly better suitable than the general-purpose fast hasher chosen as default byhashbrown
.As a followup to this PR, it might be reasonable to parameterize
Interner
andInternedInput
with anS: BuildHasher
type parameter so that downstream code is able to benchmark various different hash implementations based on their own representative data.