-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
add benchmarks for std.AutoHashMap #2
Comments
Relevant, but not very fleshed out (and not updated to latest Zig) so maybe not that useful: https://github.com/squeek502/zig-hash-map-bench The links in the readme should be useful, though. |
One thing I'm unsure about with this is whether or not it's possible to have a single benchmark be representative of the speed of a hash map (if you look at the links in the repo linked above, all benchmarks are presented as graphs). That is, because hash maps can have very different performance characteristics depending on the size of the map, a benchmark like 'add 1,000,000 elements, Unless I'm thinking about it wrong, I'm not sure this is a solvable problem without breaking the benchmarking into multiple different benchmarks differentiated by the size of the hash map being benchmarked. And, if that's true, then it's not clear how many different sizes should be benchmarked and what those sizes should be (would only 'small' and 'huge' be necessary? 'small', 'medium', 'large', and 'huge'? way more?). |
The idea here would be to create multiple different benchmarks, representing each "use case" that you want to keep a performance history on. Hopefully it should be reasonable to maintain this, since the manifest JSON file lets you re-use directories for multiple different benchmarks, and so you should only have to fulfill one tiny function for each different benchmark. My intuition tells me that 3-5 benchmarks should be enough to get a reasonable performance picture of this API. Another way to approach this would be to create benchmarks that represent real world usage of the API. So if you have a project that utilizes the API, make a benchmark that simulates how your project would use it. This ensures that the perf of your use case is explicitly being monitored for regressions. As you can imagine, this will become relevant for making sure self-hosted is and stays fast. |
So I was toying around with zig and implemented Project Euler 14 for fun. Since it was easy I ported it directly to go and run some benchmarks. The this makes heavy use of the HashMap, maybe it's a decent benchmark for some use cases. TL;DR:
Codeconst std = @import("std");
const AutoHashMap = std.AutoHashMap;
fn step(x: u64) u64 {
if (x & 1 > 0) {
return 3 * x + 1;
} else {
return x / 2;
}
}
fn length(cache: *AutoHashMap(u64, u64), x: u64) anyerror!u64 {
if (x <= 1) return 0;
if (cache.getValue(x)) |e| {
return e;
} else {
const next = step(x);
const len = 1 + try length(cache, next);
try cache.putNoClobber(x, len);
return len;
}
}
pub fn main() anyerror!void {
var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
defer arena.deinit();
const alloc = &arena.allocator;
var cache = AutoHashMap(u64, u64).init(alloc);
defer cache.deinit();
try cache.ensureCapacity(2000000);
var x: u64 = 0;
var maxx: u64 = 0;
var maxl: u64 = 0;
while (x < 1000000) : (x += 1) {
const l = try length(&cache, x);
if (l > maxl) {
maxl = l;
maxx = x;
}
}
std.debug.warn("{} {}\n", .{ maxx, maxl });
} package main
import "fmt"
func step(x uint64) uint64 {
if x&1 > 0 {
return 3*x + 1
} else {
return x / 2
}
}
func length(cache map[uint64]uint64, x uint64) uint64 {
if x <= 1 {
return 0
}
if e, ok := cache[x]; ok {
return e
} else {
next := step(x)
l := 1 + length(cache, next)
cache[x] = l
return l
}
}
func main() {
cache := make(map[uint64]uint64, 2000000)
var maxx uint64 = 0
var maxl uint64 = 0
for x := uint64(0); x < 1000000; x++ {
l := length(cache, x)
if l > maxl {
maxl = l
maxx = x
}
}
fmt.Println(maxx, maxl)
} Timings
|
Thanks @lemmi for this benchmark, it's now added to this repo. I think probably before closing this issue we should come up with 2-3 more benchmarks that try to cover other use cases of hash maps. Maybe one that deals with strings, and one that creates and destroys many small hash maps in memory, and any other suggestions are welcome too. |
Just saw Andrew's tweets, good work everyone. The code is a bit old however. |
Short update. While @andrewrk integrated the test, I was in the middle of benchmarking my implementation of the swiss hash map to make use of simd for (hopefully) faster accesstimes. Code: https://gist.github.com/lemmi/6576efbfd9fcc7fbf74371e42e3146a8 Once I ported my code over to the new API, I'll run the tests again with the new HashMap from current master. Brief explaination: The last number indicates the simd width. So 32 for example means 32 byte wide simd is used for metadata storage. With 16 the compiler emits SSE code, with 32 I can actually see AVX code. It's still rather puzzeling to me why the scalar version is the fastest among all architectures. Maybe the cost of loading a whole vector and iterating through the bitmask is too costly compared to the cheap comparison of a 64 bit key, which will almost never be a miss with this kind of table. AMD FX(tm)-8350
Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz
AMD Ryzen 5 3600X 6-Core Processor
|
@lemmi In my experience, going for a similar design to google's hashtable (8bit meta data per element) with scalar code was faster. I think the SIMD is beneficial to them because they try to achieve extremely high load factor (97.5% iirc) which implies quite a lot of probing, especially when they use C++'s standard hash which is of very poor quality. In your implementation, I see you're using modulus |
@Sahnvour thanks a lot for your input. Your comment made me want to measure memory usage behaviour when not preallocating memory. Also getting rid of the modulus operations was indeed getting me a couple tens of milliseconds in comparison to the numbers above. Benchmarks without preallocationThird column is the maximum allocated memory. I left out the worse performing vector sizes. Code: https://gist.github.com/lemmi/c4e9dbedc5424a82e300c0000cf46728 AMD FX(tm)-8350
Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz
AMD Ryzen 5 3600X 6-Core Processor
I hope I don't carry this too much off topic. |
This is very much on topic! |
I'm adding some benchmarks that can take up to a few tens of seconds. Should we make them as atomic as possible, or is it ok to have them all in |
No description provided.
The text was updated successfully, but these errors were encountered: