-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Generic hashtable #12848
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Generic hashtable #12848
Conversation
Do you have a separate commit that switches SDL to use this implementation? Do you have info on real-world performance and size differences? |
I was trying to avoid this sort of thing, but we've started to depend on hashes pretty heavily in some places, and the benchmarks are pretty compelling. I'd like to see if the implementation tweaks are more important than the inlining, but I doubt they are. I'd also like @Akaricchi's opinion here if possible. |
I've always advocated for templated hashtables (my original implementation was templated), but if we're doing this, I'd like to see adjustable key and value types instead of being stuck with I'm not a fan of the function definitions contained entirely inside macros. A reusable header would've been a little nicer IMO, e.g. #define SDL_HASHTABLE_SUFFIX StringToMyBigStruct
#define SDL_HASHTABLE_KEY_TYPE char*
#define SDL_HASHTABLE_VALUE_TYPE MyBigStruct
#define SDL_HASHTABLE_KEYS_EQUAL(a, b) (!SDL_strcmp(a, b))
#define SDL_HASHTABLE_HASH_KEY(k) my_hash_func(k)
#define SDL_HASHTABLE_COPY_KEY(dst, src) (*(dst) = SDL_strdup(src))
#define SDL_HASHTABLE_FREE_KEY(k) SDL_free(k)
// ... other parameters here
#include "SDL_hashtable.impl.h"
// Inside header: no include guard, #undef all parameter macros at the end
// This way it can be re-included from the same file if needed Additionally, some of the internal helper functions like As for the implementation tweaks, lowering the load factor this much is questionable to me, since robin hood hashtables are supposed to be well behaved under high loads. I'm not sure how much it affects performance, but it's definitely going to increase memory usage. The other tweaks look ok from a quick glance, but it's almost 6 AM and i haven't slept in like 12 hours, so I'll need to have another look at this later. |
Thanks for the feedback! I've implemented it as reusable header, now the usage looks like this: #define SDL_HASHTABLE_TYPENAME UintTable
#define SDL_HASHTABLE_KEY_TYPE Uint64
#define SDL_HASHTABLE_VALUE_TYPE Uint64
#define SDL_HASHTABLE_HASH_KEY(userdata, key) ((key) * 0x9E3779B1u)
#define SDL_HASHTABLE_KEYS_EQUAL(userdata, a, b) ((a) == (b))
#define SDL_HASHTABLE_THREAD_SAFE false
#define SDL_HASHTABLE_MAX_LOAD_FACTOR 128
#include <SDL_generic_hashtable.h> Updated benchmark code#include <SDL3/SDL.h>
#include <SDL3/SDL_stdinc.h>
#include <SDL_hashtable.h>
#include <benchmark/benchmark.h>
#include <chrono>
#include <iostream>
#include <random>
#define SDL_HASHTABLE_TYPENAME UintTable
#define SDL_HASHTABLE_KEY_TYPE Uint64
#define SDL_HASHTABLE_VALUE_TYPE Uint64
#define SDL_HASHTABLE_HASH_KEY(userdata, key) ((key) * 0x9E3779B1u)
#define SDL_HASHTABLE_KEYS_EQUAL(userdata, a, b) ((a) == (b))
#include <SDL_generic_hashtable.h>
void insert(benchmark::State& state) {
int n = state.range(0);
std::mt19937_64 gen(42);
std::vector<Uint64> seq(n);
for(int i = 0; i < n; i++) {
seq[i] = gen();
}
benchmark::DoNotOptimize(seq);
UintTable* table = UintTable_Create(n * 2, nullptr);
benchmark::DoNotOptimize(table);
for(auto _ : state) {
state.PauseTiming();
UintTable_Clear(table);
state.ResumeTiming();
for(int i = 0; i < n; i++) {
UintTable_Insert(table, seq[i], 0, false);
}
}
UintTable_Destroy(table);
}
BENCHMARK(insert)->ArgNames({"n"})->ArgsProduct({{100, 10000, 1000000}});
void find(benchmark::State& state) {
int n = state.range(0);
std::mt19937_64 gen(42);
std::vector<Uint64> seq(n * 2);
for(int i = 0; i < n * 2; i++) {
seq[i] = gen();
}
benchmark::DoNotOptimize(seq);
UintTable* table = UintTable_Create(0, nullptr);
for(int i = 0; i < n; i++) {
UintTable_Insert(table, seq[i], 0, false);
}
benchmark::DoNotOptimize(table);
for(auto _ : state) {
Uint64 value;
if(state.range(1)) {
for(int i = 0; i < n; i++) {
UintTable_Find(table, seq[i], &value);
}
} else {
for(int i = n; i < n * 2; i++) {
UintTable_Find(table, seq[i], &value);
}
}
benchmark::DoNotOptimize(value);
}
}
BENCHMARK(find)->ArgNames({"n", "exists"})->ArgsProduct({{100, 10000, 1000000}, {0, 1}});
void erase(benchmark::State& state) {
int n = state.range(0);
std::mt19937_64 gen(42);
std::vector<Uint64> seq(n * 2);
for(int i = 0; i < n * 2; i++) {
seq[i] = gen();
}
benchmark::DoNotOptimize(seq);
UintTable* table = UintTable_Create(0, nullptr);
benchmark::DoNotOptimize(table);
for(auto _ : state) {
state.PauseTiming();
UintTable_Clear(table);
for(int i = 0; i < n; i++) {
UintTable_Insert(table, seq[i], 0, false);
}
state.ResumeTiming();
if(state.range(1)) {
for(int i = 0; i < n; i++) {
UintTable_Remove(table, seq[i]);
}
} else {
for(int i = n; i < n * 2; i++) {
UintTable_Remove(table, seq[i]);
}
}
}
}
BENCHMARK(erase)->ArgNames({"n", "exists"})->ArgsProduct({{100, 10000, 1000000}, {0, 1}});
BENCHMARK_MAIN(); Updated stress test code#include <SDL3/SDL.h>
#include <SDL3/SDL_stdinc.h>
#include <SDL_hashtable.h>
#include <benchmark/benchmark.h>
#include <chrono>
#include <iostream>
#include <random>
std::mt19937 gen(std::chrono::steady_clock::now().time_since_epoch().count());
void stress() {
SDL_HashTable* table = SDL_CreateHashTable(0, true, SDL_HashID, SDL_KeyMatchID, NULL, NULL);
UintTable* utable = UintTable_Create(0, NULL);
const int N = 1000;
std::vector<Uint64> keys(N), values(N);
for(int i = 0; i < N; i++) {
keys[i] = gen();
values[i] = gen();
}
Uint64 res1, res2;
bool fl1, fl2;
for(int it = 0; it < N * 20; it++) {
Uint64 op = gen() % 3;
Uint64 arg = gen() % N;
switch(op) {
case 0:
fl1 = SDL_InsertIntoHashTable(table, (const void*)keys[arg], (const void*)values[arg], true);
fl2 = UintTable_Insert(utable, keys[arg], values[arg], true);
if(fl1 != fl2) {
exit(1);
}
break;
case 1:
fl1 = SDL_RemoveFromHashTable(table, (const void*)keys[arg]);
fl2 = UintTable_Remove(utable, keys[arg]);
if(fl1 != fl2) {
exit(2);
}
break;
case 2:
res1 = UINT64_MAX;
res2 = UINT64_MAX;
SDL_FindInHashTable(table, (const void*)keys[arg], (const void**)(&res1));
UintTable_Find(utable, keys[arg], &res2);
if(res1 != res2) {
exit(3);
}
break;
}
}
}
int main() {
while(true) {
stress();
}
} Key and value are still stored in SDL_HashItem, so this is not suitable for large size keys and values. In that case user is expected to manage memory by himself and store pointers/indices in hashtable. |
SDL_HashTable uses callbacks to calculate hash, compare keys and destroy items. Compiler doesn't know anything about functions that hashtable is going to call, so some optimizations can't be applied. But in almost all cases we know the functions to be called at compile time. Generic hashtable is a reusable header that generates hashtable implementation for concrete hash, compare and destroy functions. It allows them to be inlined by the compiler, resulting in 1.5-4x performance boost compared to SDL_HashTable. The implementation is taken from SDL_HashTable, with a few improvements: - Lowered max load factor to 128 (50%). Seems to be a good default value, as find is unlikely to load more than one cache line. The value also can be overriden by setting SDL_HASHTABLE_MAX_LOAD_FACTOR. - Removed live bit field, using probe_len = 0 to indicate whether the item is live instead. Seems to be a bit faster, as CPU doesn't need to reset live bit each time probe_len is accessed. - Removed max_probe_len. Unneeded, as we compare to probe_len of each item we visit, which is a stronger break condition.
I'll try to do this when the API will be more stable, currently it changes quickly and keeping everything up to date will be a lot of work. |
Yes, don't switch things over until everyone is happy with the implementation. I appreciate how responsive to feedback you're being, but let's not overwork you. :) |
SDL_HashTable uses callbacks to calculate hash, compare keys and destroy items. Compiler doesn't know anything about functions that hashtable is going to call, so some optimizations can't be applied. But in almost all cases we know the functions to be called at compile time.
Generic hashtable is a macro that generates hashtable implementation for concrete hash, compare and destroy functions. It allows them to be inlined by the compiler, resulting in 1.5-4x performance boost compared to SDL_HashTable. The implementation is taken from SDL_HashTable, with a few improvements:
Lowered max load factor to 128 (50%). Seems to be a good default value, as find is unlikely to load more than one cache line. The value also can be overriden by using SDL_MAKE_HASHTABLE_TYPE_EXT.
Removed live bit field, using
probe_len = 0
to indicate whether the item is live instead. Seems to be a bit faster, as CPU doesn't need to reset live bit each timeprobe_len
is accessed.Removed
max_probe_len
. Unneeded, as we compare toprobe_len
of each item we visit, which is a stronger break condition.Benchmarks were done with Google Benchmark on Ryzen 9 7940HS, Gentoo Linux, performance governor, turbo boost disabled.
Benchmark code for SDL_HashTable
Benchmark code for generic hashtable
This was not tested thoroughly, but I've done basic stress test against SDL_HashTable, so at least there is no obvious issues with it.
Stress test code