Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes the performance bottlenecks of
CpuBufferPool
, now renamed toCpuBufferAllocator
. The algorithm has been optimized to pretty much the theoretical limit, which required that the allocator is made!Sync
. There still is the option to wrap the allocator in a mutex, if absolutely neccessary. I did a number of benchmarks to measure the throughput before and after for different sizes of inputs (using a logarithmic scale), and pretty consistenly observed a -67% reduction in overhead, or a 3X performance increase. I suspect that someone with better hardware than mine will observe better results, though. According to my profiling, all time spent inCpuBufferAllocator::from_iter
is now spent on copying data to the mapping. As that is completely dependant on the hardware and driver, I hope that this closes #1434. There isn't much more we can do in that sense I don't think.I also noticed that the allocator having a
T
type parameter didn't really make sense. Mainly for the reason that the whole point of having one buffer be suballocated is so that you can fit all kinds of data needed each frame into one buffer, which is not possible if the buffers have different types, unless you cast your data to&[u8]
, in which case alignment goes ouf of the window on the other hand. Since there really was no reason for this constraint, I have moved the type parameter to the methods that allocate subbuffers. It also didn't make sense semantically, because the allocator doesn't own anyT
s, the allocated subbuffers do.Changelog: