cache-align the shards to improve throughput #303
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Using CachePadded to cache-align the rwlock shards for improved locking performance. This offers considerable (30-50%) improvements in both latency and throughput perf on my M2.
Note
This increases memory usage. On x86_64, the cache alignment is set to 128. The current size of
RwLock<HashMap<K, V, std::RandomState>>
is8 + 16 + 32 = 56
bytes. The current size ofRwLock<HashMap<K, V, ahash::RandomState>>
is8 + 32 + 32 = 72
bytes. So this will double the size of the empty collection. Eg on a 64 core CPU the empty std dashmap size will increase from 14KiB to 32KiB. This size increase is constant though and does not scale per element inserted into the map.Important
This is a breaking change for the raw shards api.
Benchmark results
In this benchmark,
DashMap
is the released version 5.3.3DashMap2
is this version