Skip to content

Commit

Permalink
optimize Hash for Path
Browse files Browse the repository at this point in the history
Hashing does not have to use the whole Components parsing machinery because we only need it to match the
normalizations that Components does.

* stripping redundant separators -> skipping separators
* stripping redundant '.' directories -> skipping '.' following after a separator

That's all it takes.

And instead of hashing individual slices for each component we feed the bytes directly into the hasher which avoids
hashing the length of each component in addition to its contents.
  • Loading branch information
the8472 committed Nov 9, 2021
1 parent 82b4544 commit a083dd6
Showing 1 changed file with 28 additions and 2 deletions.
30 changes: 28 additions & 2 deletions library/std/src/path.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2873,9 +2873,35 @@ impl cmp::PartialEq for Path {
#[stable(feature = "rust1", since = "1.0.0")]
impl Hash for Path {
fn hash<H: Hasher>(&self, h: &mut H) {
for component in self.components() {
component.hash(h);
let bytes = self.as_u8_slice();

let mut component_start = 0;
let mut bytes_hashed = 0;

for i in 0..bytes.len() {
if is_sep_byte(bytes[i]) {
if i > component_start {
let to_hash = &bytes[component_start..i];
h.write(to_hash);
bytes_hashed += to_hash.len();
}

// skip over separator and optionally a following CurDir item
// since components() would normalize these away
component_start = i + match bytes[i..] {
[_, b'.', b'/', ..] | [_, b'.'] => 2,
_ => 1,
};
}
}

if component_start < bytes.len() {
let to_hash = &bytes[component_start..];
h.write(to_hash);
bytes_hashed += to_hash.len();
}

h.write_usize(bytes_hashed);
}
}

Expand Down

0 comments on commit a083dd6

Please # to comment.