Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

CpuBufferPool revamp #2076

Merged
merged 4 commits into from
Nov 5, 2022
Merged

CpuBufferPool revamp #2076

merged 4 commits into from
Nov 5, 2022

Conversation

marc0246
Copy link
Contributor

@marc0246 marc0246 commented Nov 2, 2022

Fixes the performance bottlenecks of CpuBufferPool, now renamed to CpuBufferAllocator. The algorithm has been optimized to pretty much the theoretical limit, which required that the allocator is made !Sync. There still is the option to wrap the allocator in a mutex, if absolutely neccessary. I did a number of benchmarks to measure the throughput before and after for different sizes of inputs (using a logarithmic scale), and pretty consistenly observed a -67% reduction in overhead, or a 3X performance increase. I suspect that someone with better hardware than mine will observe better results, though. According to my profiling, all time spent in CpuBufferAllocator::from_iter is now spent on copying data to the mapping. As that is completely dependant on the hardware and driver, I hope that this closes #1434. There isn't much more we can do in that sense I don't think.

I also noticed that the allocator having a T type parameter didn't really make sense. Mainly for the reason that the whole point of having one buffer be suballocated is so that you can fit all kinds of data needed each frame into one buffer, which is not possible if the buffers have different types, unless you cast your data to &[u8], in which case alignment goes ouf of the window on the other hand. Since there really was no reason for this constraint, I have moved the type parameter to the methods that allocate subbuffers. It also didn't make sense semantically, because the allocator doesn't own any Ts, the allocated subbuffers do.

Changelog:

### Breaking changes:
Changes to CPU buffer allocation:
- Replaced `CpuBufferPool` with `CpuBufferAllocator`, which is now marked `!Sync` and no longer has a `T` type parameter. The type parameter was moved to the methods, to allow one allocator to allocate as many types of buffers as needed.
- Merged `CpuBufferPoolChunk` and `CpuBufferPoolSubbuffer` into `CpuSubbuffer`.

### Additions
- Added `CpuBufferAllocatorCreateInfo`.

### Bugs fixed
- Fixed an issue with `CpuBufferPool<T>`, where the alignment of `T` was not being considered when allocating.
- Fixed an issue with `CpuBufferPool`, where the allocated subbuffers did not respect the non-coherent atom size for non-host-coherent memory types.

@Rua
Copy link
Contributor

Rua commented Nov 5, 2022

Are you intending to implement some of the things we talked about yesterday, or is this good to go now?

@marc0246
Copy link
Contributor Author

marc0246 commented Nov 5, 2022

All good.

@Rua Rua merged commit fe01ddd into vulkano-rs:master Nov 5, 2022
Rua added a commit that referenced this pull request Nov 5, 2022
@marc0246 marc0246 deleted the cpu-pool-revamp branch November 8, 2022 10:58
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CpuBufferPool slower than glium
2 participants