performance optimization idea: skip Wyhash for auto hashing of pointers and simply truncate their address to u32 #9

andrewrk · 2020-07-06T04:36:20Z

In theory, the least significant bits of a pointer address should already be close to an ideal hash. We should be able to use those directly as the hash rather than hashing the pointer address.

Another possible improvement would be if we know the alignment of the element type, it guarantees the least significant bits to always be zero. We could shift those 0 bits out.

It would look like this:

pub fn getAutoHashFn(comptime K: type) (fn (K) u32) {
    return struct {
        fn hash(key: K) u32 {
            switch (@typeInfo(K)) {
                .Pointer => |info| if (info.size != .Slice) {
                    // No need to pipe this through Wyhash. The least significant bits of
                    // a pointer address are already nearly perfect for use as a hash.
                    const bits_to_shift = comptime math.log2(info.alignment);
                    return @truncate(u32, @ptrToInt(key) >> bits_to_shift);
                },
                else => {},
            }
            var hasher = Wyhash.init(0);
            autoHash(&hasher, key);
            return @truncate(u32, hasher.final());
        }
    }.hash;
}

As usual it would be good to test this before blindly implementing it.

Related: #2

Sahnvour · 2020-07-06T07:45:53Z

I fear that it will likely loose entropy and produce a very bad distribution. Shifting gets rid of the least significant bits that are guaranteed to be 0, but it doesn't add new information in the high bits, that will be used in case of a large hashmap.
That's also losing bits in the [32;64] range since Zig's hashes are 32bits (they should be switched to 64, or machine size, right ?)

Also, we can assume that the pointers will likely come from some memory allocator, and those typically exhibit patterns. That's something one wants to avoid very hard by using a hash function. Hashing may look like a bottleneck in the use of the hashmap, but it's also what's making it work. Bypassing it feels like a shortcut on a cliff's edge.

Did you find that wyhash is expensive or badly optimized in the case of pointers ? I assume it should get unrolled and inlined to something very efficient. If that's not the case, maybe a special case for integers/pointers could be used, for example Murmurhash's finalizer https://github.com/Sahnvour/zig-containers/blob/master/hashmap.zig#L11.

andrewrk · 2020-07-06T23:02:03Z

Did you find that wyhash is expensive or badly optimized in the case of pointers ?

I did a quick test but I wouldn't say I have a confident answer to that question.

Thanks for your enlightenment on this - I am convinced this is not worth looking into any further.

andrewrk closed this as completed Jul 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance optimization idea: skip Wyhash for auto hashing of pointers and simply truncate their address to u32 #9

performance optimization idea: skip Wyhash for auto hashing of pointers and simply truncate their address to u32 #9

andrewrk commented Jul 6, 2020

Sahnvour commented Jul 6, 2020

andrewrk commented Jul 6, 2020 •

edited

Loading

performance optimization idea: skip Wyhash for auto hashing of pointers and simply truncate their address to u32 #9

performance optimization idea: skip Wyhash for auto hashing of pointers and simply truncate their address to u32 #9

Comments

andrewrk commented Jul 6, 2020

Sahnvour commented Jul 6, 2020

andrewrk commented Jul 6, 2020 • edited Loading

andrewrk commented Jul 6, 2020 •

edited

Loading