You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In theory, the least significant bits of a pointer address should already be close to an ideal hash. We should be able to use those directly as the hash rather than hashing the pointer address.
Another possible improvement would be if we know the alignment of the element type, it guarantees the least significant bits to always be zero. We could shift those 0 bits out.
It would look like this:
pubfngetAutoHashFn(comptimeK: type) (fn (K) u32) {
returnstruct {
fnhash(key: K) u32 {
switch (@typeInfo(K)) {
.Pointer=>|info|if (info.size!=.Slice) {
// No need to pipe this through Wyhash. The least significant bits of// a pointer address are already nearly perfect for use as a hash.constbits_to_shift=comptimemath.log2(info.alignment);
return@truncate(u32, @ptrToInt(key) >>bits_to_shift);
},
else=> {},
}
varhasher=Wyhash.init(0);
autoHash(&hasher, key);
return@truncate(u32, hasher.final());
}
}.hash;
}
As usual it would be good to test this before blindly implementing it.
I fear that it will likely loose entropy and produce a very bad distribution. Shifting gets rid of the least significant bits that are guaranteed to be 0, but it doesn't add new information in the high bits, that will be used in case of a large hashmap.
That's also losing bits in the [32;64] range since Zig's hashes are 32bits (they should be switched to 64, or machine size, right ?)
Also, we can assume that the pointers will likely come from some memory allocator, and those typically exhibit patterns. That's something one wants to avoid very hard by using a hash function. Hashing may look like a bottleneck in the use of the hashmap, but it's also what's making it work. Bypassing it feels like a shortcut on a cliff's edge.
Did you find that wyhash is expensive or badly optimized in the case of pointers ? I assume it should get unrolled and inlined to something very efficient. If that's not the case, maybe a special case for integers/pointers could be used, for example Murmurhash's finalizer https://github.com/Sahnvour/zig-containers/blob/master/hashmap.zig#L11.
In theory, the least significant bits of a pointer address should already be close to an ideal hash. We should be able to use those directly as the hash rather than hashing the pointer address.
Another possible improvement would be if we know the alignment of the element type, it guarantees the least significant bits to always be zero. We could shift those 0 bits out.
It would look like this:
As usual it would be good to test this before blindly implementing it.
Related: #2
The text was updated successfully, but these errors were encountered: