-
Notifications
You must be signed in to change notification settings - Fork 13.4k
std::hash::Hash
documentation should suggest that the hash data should be prefix-free
#89429
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Should hand-implementors of struct User {
name: String,
age: u32,
cool: bool,
}
impl Hash for User {
fn hash<H: Hasher>(&self, state: &mut H) {
self.name.hash(state);
self.age.hash(state);
self.cool.hash(state);
}
} There are no magic bytes here, as in |
use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};
#[derive(Hash)]
struct User {
name: String,
age: u32,
cool: bool,
}
struct Boozer {
name: String,
age: u32,
cool: bool,
}
impl Hash for Boozer {
fn hash<H: Hasher>(&self, state: &mut H) {
self.name.hash(state);
self.age.hash(state);
self.cool.hash(state);
}
}
fn hash<H: Hash>(h: H) {
let mut hasher = DefaultHasher::new();
h.hash(&mut hasher);
println!("{}", hasher.finish());
}
fn main() {
let user = User {
name: "Jack".to_string(),
age: 8,
cool: true,
};
let boozer = Boozer {
name: "Jack".to_string(),
age: 8,
cool: true,
};
let tuple = ("Jack".to_string(), 8, true);
hash(user);
hash(boozer);
hash(tuple);
} The results are:
Is this expected? |
Different types hashing to the same value are not a problem. What is a problem is different (unequal) values of the same type hashing to the same value (for all |
@rustbot claim |
Collisions are inevitable, and it shouldn't matter between I do agree that types should try to avoid collisions, as long as we make this kind of thing a suggestion of best practice, not stated as a hard requirement of the trait. |
Collisions in One benefit of not creating collisions in |
OK, that's a compelling distinction between the responsibilities of Is it always possible to do this perfectly on the |
[Trying again to set labels since I typoed the first one:] |
I am not sure it's always possible, in theory you could imagine some contrived I think it would be fine to say that if can't come up with a proper |
For the docs, would it make sense to discuss the case of rust/library/core/src/hash/mod.rs Lines 246 to 249 in ed93759
The paragraph could start with something like:
|
Let's then note that |
I gave this a try! I might not have understood correctly, however... |
FWIW, being prefix-free can also be unfortunate sometimes. For example, I don't know that there's a great answer for this, though... |
See #89443. |
Is there a real reason In any case, slices are already prefix-free (as long as T is) (and I think that's good, otherwise you'd have a lot of collisions on different short slices), so this issue doesn't change that. |
Yes, |
Nominated this for @rust-lang/libs-api discussion. |
@tczajka I added a comment to the impl in #86140 after I forget why it was that way too :) |
One possibility would be to leave the choice of whether to add a prefix to the This could be done with an optional method on the trait Hasher {
// Option 1
fn should_add_prefix(&self) -> bool { false }
// Option 2
fn write_prefix(&mut self, prefix: usize) { self.write_usize(prefix); }
} |
I dispute the idea that making the Potential huge cost of resulting hash collisions (for certain kinds of data) is likely to greatly overwhelm any such (tiny) savings from lazy hashing. I think of this as a bug because it defeats the assumptions of mathematical analysis of the performance properties of hash tables with certain hashers. |
docs: `std::hash::Hash` should ensure prefix-free data Attempt to synthesize the discussion in rust-lang#89429 into a suggestion regarding `Hash` implementations (not a hard requirement). Closes rust-lang#89429.
Uh oh!
There was an error while loading. Please reload this page.
As discussed at Why does
str.hash(…)
pass an extra byte to the Hasher?: the code ofimpl Hash for &str
specifically passes an extra 0xFF byte to the Hasher, so that values like("ab", "c")
and("a", "bc")
hash differently (added in 6066118).This is a subtle property of hashing and should probably be mentioned in the documentation for
Hash
— particularly as the documentation forHasher
already says “you cannot assume, for example, that awrite_u32
call is equivalent to four calls ofwrite_u8
” which could be sloppily interpreted as an expectation that a goodHasher
implementation will handle “quoting” of sequential calls itself.If an implementor of
Hash
fails to have this property, it could compromise the hash-DoS resistance that Rust tries to offer by default.@rustbot labels: +A-docs +T-libs-api +C-enhancement
The text was updated successfully, but these errors were encountered: