Parallelize batching #12489

james7132 · 2024-03-15T06:15:47Z

Objective

Improve CPU-side rendering performance.

Solution

Create BufferPool variants of DynamicBuffer, StorageBuffer, BatchedUniformBuffer, and GpuArrayBuffer. They do not have a system RAM side buffer of any kind, but rely on Queue::write_buffer_with to directly write values into a staging buffer. As Queue::write_buffer with operates with a &Queue, it's possible to parallelize down to a view level when batching.

The downside is that these buffers are not resizable after being mapped, so these types must reserve fixed sized slices from the buffer ahead of time. The data flow runs as follows:

clear_batch_buffer clears the buffer pool.
reserve_batch_buffer mutably grabs the buffer pool and reserves a range for every RenderPhase<T> and saves the reserved in the RenderPhase. This is a very fast O(1) operation that does not require allocation or IO of any kind. These must run sequentially due to needing to grab the pool and render phases in parallel.
allocate_batch_buffer allocates the actual GPU-side buffer.
batch_and_prepare_render_phase then runs on each of them in parallel and parallelizes individual views with Query::par_iter_mut.

NOTE: I'm likely getting something wrong with the indices, which is causing mesh draw calls to be mismatched, causing them to be rendered randomly, which can be potentially seizure inducing depending on the scene. Please be aware of this while testing this change.

Performance

Tested against many_foxes, this nets a rough 4% improvement in render schedule timings.

TODO: Test this against heavier scenes.

Changelog

TODO

Migration Guide

TODO

superdump · 2024-03-15T07:00:50Z

crates/bevy_pbr/src/prepass/mod.rs

+                            batch_and_prepare_render_phase::<Opaque3dPrepass, MeshPipeline>,
+                            batch_and_prepare_render_phase::<AlphaMask3dPrepass, MeshPipeline>,
+                        )
+                            .after(allocate_batch_buffer::<MeshPipeline>),


This starts to look like we maybe want a batching plugin that is generic over render phase item and the base specialisation pipeline or something.

Agreed there. This is getting unwieldy to manage.

Today, when we upload data to a `StorageBuffer`, it must be copied twice: once from `StorageBuffer::value` to `StorageBuffer::scratch`, and once from `StorageBuffer::scratch` to the GPU. This patch eliminates the former copy in favor of storing the single canonical copy of the data directly inside the `scratch` field. This patch complements bevyengine#12489, which eliminates both copies for mesh data at the cost of requiring more overhead for GPU I/O, especially for single-threaded situations. Simply eliminating one of the copies seems an unambiguous win (and is lower-hanging fruit), so both this patch and PR bevyengine#12489 are valuable.

Co-Authored By: robert.swain@gmail.com

james7132 added 2 commits March 14, 2024 21:22

Make a Queue::write_buffer_with oriented set of buffer types

c233655

Parallelize batching and preparing

cb0a4e3

james7132 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Mar 15, 2024

james7132 requested review from superdump and pcwalton March 15, 2024 06:16

superdump reviewed Mar 15, 2024

View reviewed changes

james7132 added the M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide label Mar 15, 2024

Better handle low count parallel queries, and formatting

6014070

superdump mentioned this pull request Mar 20, 2024

Renderer optimization tracking issue #12590

Open

19 tasks

james7132 mentioned this pull request Mar 23, 2024

Use Encase's Layouting Independent of Uploading teoxoy/encase#16

Open

james7132 added 7 commits March 30, 2024 12:57

Merge branch 'main' into parallelize-batching

41fdadf

Get it building again

1c83cb1

Merge branch 'main' into parallelize-batching

78d62ed

Shut up clippy

10cbde5

Merge branch 'main' into parallelize-batching

b4268f5

Fix BatchedUniformBuffer offsets

a182ca9

Co-Authored By: robert.swain@gmail.com

Fix storage buffer offsets

51e5f21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize batching #12489

Parallelize batching #12489

james7132 commented Mar 15, 2024

superdump Mar 15, 2024

james7132 Mar 15, 2024

Parallelize batching #12489

Are you sure you want to change the base?

Parallelize batching #12489

Conversation

james7132 commented Mar 15, 2024

Objective

Solution

Performance

Changelog

Migration Guide

superdump Mar 15, 2024

Choose a reason for hiding this comment

james7132 Mar 15, 2024

Choose a reason for hiding this comment