Skip to content

Fix 32- vs 64-bit platform instability in StableHasher. #45522

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 2 commits into from
Oct 26, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 16 additions & 24 deletions src/librustc_data_structures/stable_hasher.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,24 +13,13 @@ use std::marker::PhantomData;
use std::mem;
use sip128::SipHasher128;

/// When hashing something that ends up affecting properties like symbol names. We
/// want these symbol names to be calculated independent of other factors like
/// what architecture you're compiling *from*.
/// When hashing something that ends up affecting properties like symbol names,
/// we want these symbol names to be calculated independently of other factors
/// like what architecture you're compiling *from*.
///
/// The hashing just uses the standard `Hash` trait, but the implementations of
/// `Hash` for the `usize` and `isize` types are *not* architecture independent
/// (e.g. they has 4 or 8 bytes). As a result we want to avoid `usize` and
/// `isize` completely when hashing.
///
/// To do that, we encode all integers to be hashed with some
/// arch-independent encoding.
///
/// At the moment, we pass i8/u8 straight through and encode
/// all other integers using leb128.
///
/// This hasher currently always uses the stable Blake2b algorithm
/// and allows for variable output lengths through its type
/// parameter.
/// To that end we always convert integers to little-endian format before
/// hashing and the architecture dependent `isize` and `usize` types are
/// extended to 64 bits if needed.
pub struct StableHasher<W> {
state: SipHasher128,
bytes_hashed: u64,
Expand Down Expand Up @@ -86,9 +75,6 @@ impl<W> StableHasher<W> {
}
}

// For the non-u8 integer cases we leb128 encode them first. Because small
// integers dominate, this significantly and cheaply reduces the number of
// bytes hashed, which is good because blake2b is expensive.
impl<W> Hasher for StableHasher<W> {
fn finish(&self) -> u64 {
panic!("use StableHasher::finalize instead");
Expand Down Expand Up @@ -132,8 +118,11 @@ impl<W> Hasher for StableHasher<W> {

Copy link
Contributor

@arielb1 arielb1 Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment that says

// For the non-u8 integer cases we leb128 encode them first. Because small

and another comment that says

/// At the moment, we pass i8/u8 straight through and encode
/// all other integers using leb128.

Please fix it

#[inline]
fn write_usize(&mut self, i: usize) {
self.state.write_usize(i.to_le());
self.bytes_hashed += ::std::mem::size_of::<usize>() as u64;
// Always treat usize as u64 so we get the same results on 32 and 64 bit
// platforms. This is important for symbol hashes when cross compiling,
// for example.
self.state.write_u64((i as u64).to_le());
self.bytes_hashed += 8;
}

#[inline]
Expand Down Expand Up @@ -168,8 +157,11 @@ impl<W> Hasher for StableHasher<W> {

#[inline]
fn write_isize(&mut self, i: isize) {
self.state.write_isize(i.to_le());
self.bytes_hashed += ::std::mem::size_of::<isize>() as u64;
// Always treat isize as i64 so we get the same results on 32 and 64 bit
// platforms. This is important for symbol hashes when cross compiling,
// for example.
self.state.write_i64((i as i64).to_le());
self.bytes_hashed += 8;
}
}

Expand Down