Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Drop aHash dependency in favor of hashbrown's choice of default hasher #17

Merged
merged 5 commits into from
Dec 9, 2024

Conversation

dtolnay
Copy link
Contributor

@dtolnay dtolnay commented Dec 9, 2024

The ahash crate is problematic (tkaitchuck/aHash#238, tkaitchuck/aHash#239, tkaitchuck/aHash#242) and not well maintained.

Since Interner<T> has to be able to hash arbitrary token types (everything ranging from long lines of code to individual codepoints, which have pretty different hashing performance characteristics) it is unlikely that any hasher choice made by this crate would be broadly better suitable than the general-purpose fast hasher chosen as default by hashbrown.

As a followup to this PR, it might be reasonable to parameterize Interner and InternedInput with an S: BuildHasher type parameter so that downstream code is able to benchmark various different hash implementations based on their own representative data.

Byron
Byron previously approved these changes Dec 9, 2024
Copy link
Collaborator

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this makes sense, particularly with the outlook of parameterising the hasher itself.

@Byron Byron requested a review from pascalkuthe December 9, 2024 17:50
src/intern.rs Outdated
@@ -182,3 +182,12 @@ impl<T: Hash + Eq> Index<Token> for Interner<T> {
&self.tokens[index.0 as usize]
}
}

// TODO: remove in favor of BuildHasher::hash_one once compilers older than 1.71
Copy link
Owner

@pascalkuthe pascalkuthe Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with bunping msrv for this, I loosly follow firefox MSRV and that is on 1.76 at this point (up to you to you want to do that in this PR, if lot let me know and I will merge as-is)

pascalkuthe
pascalkuthe previously approved these changes Dec 9, 2024
Copy link
Owner

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is simply a leftover from when ahash was the hashbrown default hasher (and fxhash suffers with strings which is the main usecase). Foldhash is a nice choice.

I think parameterizing by hasher is a good plan. I tend ot not be the biggest fan of it since it's quite viral in the codebase (and can't use new/with_capacity) so I prefer to have a good default choice (like hashbrown) but for downstream user to experiment/overwire it's definitly nice

@dtolnay dtolnay dismissed stale reviews from pascalkuthe and Byron via b133275 December 9, 2024 19:17
    warning: unnecessary closure used with `bool::then`
       --> src/myers/middle_snake.rs:248:9
        |
    248 |         (best_score > 0).then(|| (best_token_idx1, best_token_idx2))
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |
        = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations
        = note: `#[warn(clippy::unnecessary_lazy_evaluations)]` on by default
    help: use `then_some` instead
        |
    248 |         (best_score > 0).then_some((best_token_idx1, best_token_idx2))
        |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    warning: this `impl` can be derived
       --> src/lib.rs:233:1
        |
    233 | / impl Default for Algorithm {
    234 | |     fn default() -> Self {
    235 | |         Algorithm::Histogram
    236 | |     }
    237 | | }
        | |_^
        |
        = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derivable_impls
        = note: `#[warn(clippy::derivable_impls)]` on by default
    help: replace the manual implementation with a derive attribute and mark the default variant
        |
    175 + #[derive(Default)]
    176 ~ pub enum Algorithm {
    177 |     /// A variation of the [`patience` diff algorithm described by Bram Cohen's blog post](https://bramcohen.livejournal.com/73318.html)
    ...
    202 |     /// be used instead.
    203 ~     #[default]
    204 ~     Histogram,
        |
@pascalkuthe pascalkuthe merged commit 5c6534c into pascalkuthe:master Dec 9, 2024
5 checks passed
@dtolnay dtolnay deleted the hasher branch December 9, 2024 19:37
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants