Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

performance optimization idea: skip Wyhash for auto hashing of pointers and simply truncate their address to u32 #9

Closed
andrewrk opened this issue Jul 6, 2020 · 2 comments

Comments

@andrewrk
Copy link
Member

andrewrk commented Jul 6, 2020

In theory, the least significant bits of a pointer address should already be close to an ideal hash. We should be able to use those directly as the hash rather than hashing the pointer address.

Another possible improvement would be if we know the alignment of the element type, it guarantees the least significant bits to always be zero. We could shift those 0 bits out.

It would look like this:

pub fn getAutoHashFn(comptime K: type) (fn (K) u32) {
    return struct {
        fn hash(key: K) u32 {
            switch (@typeInfo(K)) {
                .Pointer => |info| if (info.size != .Slice) {
                    // No need to pipe this through Wyhash. The least significant bits of
                    // a pointer address are already nearly perfect for use as a hash.
                    const bits_to_shift = comptime math.log2(info.alignment);
                    return @truncate(u32, @ptrToInt(key) >> bits_to_shift);
                },
                else => {},
            }
            var hasher = Wyhash.init(0);
            autoHash(&hasher, key);
            return @truncate(u32, hasher.final());
        }
    }.hash;
}

As usual it would be good to test this before blindly implementing it.

Related: #2

@Sahnvour
Copy link
Contributor

Sahnvour commented Jul 6, 2020

I fear that it will likely loose entropy and produce a very bad distribution. Shifting gets rid of the least significant bits that are guaranteed to be 0, but it doesn't add new information in the high bits, that will be used in case of a large hashmap.
That's also losing bits in the [32;64] range since Zig's hashes are 32bits (they should be switched to 64, or machine size, right ?)

Also, we can assume that the pointers will likely come from some memory allocator, and those typically exhibit patterns. That's something one wants to avoid very hard by using a hash function. Hashing may look like a bottleneck in the use of the hashmap, but it's also what's making it work. Bypassing it feels like a shortcut on a cliff's edge.

Did you find that wyhash is expensive or badly optimized in the case of pointers ? I assume it should get unrolled and inlined to something very efficient. If that's not the case, maybe a special case for integers/pointers could be used, for example Murmurhash's finalizer https://github.com/Sahnvour/zig-containers/blob/master/hashmap.zig#L11.

@andrewrk
Copy link
Member Author

andrewrk commented Jul 6, 2020

Did you find that wyhash is expensive or badly optimized in the case of pointers ?

I did a quick test but I wouldn't say I have a confident answer to that question.

Thanks for your enlightenment on this - I am convinced this is not worth looking into any further.

@andrewrk andrewrk closed this as completed Jul 6, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants