Skip to content

Commit

Permalink
Implement minimal GPU culling for cameras.
Browse files Browse the repository at this point in the history
This commit introduces a new component, `GpuCulling`, which, when
present on a camera, skips the CPU visibility check in favor of doing
the frustum culling on the GPU. This trades off potentially-increased
CPU work and drawcalls in favor of cheaper culling and doesn't improve
the performance of any workloads that I know of today. However, it opens
the door to significant optimizations in the future by taking the
necessary first step toward *GPU-driven rendering*.

Enabling GPU culling for a view puts the rendering for that view into
*indirect mode*. In indirect mode, CPU-level visibility checks are
skipped, and all visible entities are considered potentially visible.
Bevy's batching logic still runs as usual, but it doesn't directly
generate mesh instance indices. Instead, it generates *instance
handles*, which are indices into an array of real instance indices.
Before any rendering is done, for each view, a compute shader,
`cull.wgsl`, maps instance handles to instance indices, discarding any
instance handles that represent meshes that are outside the visible
frustum. Draws are then done using the *indirect draw* feature of
`wgpu`, which instructs the GPU to read the number of actual instances
from the output of that compute shader.

Essentially, GPU culling works by adding a new level of indirection
between the CPU's notion of instances (known as instance handles) and
the GPU's notion of instances.

A new `--gpu-culling` flag has been added to the `many_foxes`,
`many_cubes`, and `3d_shapes` examples.

Potential follow-ups include:

* Split up `RenderMeshInstances` into CPU-driven and GPU-driven parts.
  The former, which contain fields like the transform, won't be
  initialized at all in when GPU culling is enabled. Instead, the
  transform will be directly written to the GPU in `extract_meshes`,
  like `extract_skins` does for joint matrices.

* Implement GPU culling for shadow maps.

  - Following that, we can treat all cascades as one as far as the CPU
    is concerned, simply replaying the final draw commands with
    different view uniforms, which should reduce the CPU overhead
    considerably.

* Retain bins from frame to frame so that they don't have to be rebuilt.
  This is a longer term project that will build on top of bevyengine#12453 and
  several of the tasks in bevyengine#12590, such as main-world pipeline
  specialization.
  • Loading branch information
pcwalton committed Mar 23, 2024
1 parent e33b93e commit 8ebfd67
Show file tree
Hide file tree
Showing 32 changed files with 1,049 additions and 162 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
@group(0) @binding(3) var dt_lut_texture: texture_3d<f32>;
@group(0) @binding(4) var dt_lut_sampler: sampler;
#else
@group(0) @binding(18) var dt_lut_texture: texture_3d<f32>;
@group(0) @binding(19) var dt_lut_sampler: sampler;
@group(0) @binding(19) var dt_lut_texture: texture_3d<f32>;
@group(0) @binding(20) var dt_lut_sampler: sampler;
#endif

fn sample_current_lut(p: vec3<f32>) -> vec3<f32> {
Expand Down
3 changes: 3 additions & 0 deletions crates/bevy_gizmos/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,7 @@ impl<const I: usize, P: PhaseItem> RenderCommand<P> for SetLineGizmoBindGroup<I>
#[inline]
fn render<'w>(
_item: &P,
_index: usize,
_view: ROQueryItem<'w, Self::ViewQuery>,
uniform_index: Option<ROQueryItem<'w, Self::ItemQuery>>,
bind_group: SystemParamItem<'w, '_, Self::Param>,
Expand All @@ -460,6 +461,7 @@ impl<P: PhaseItem> RenderCommand<P> for DrawLineGizmo {
#[inline]
fn render<'w>(
_item: &P,
_index: usize,
_view: ROQueryItem<'w, Self::ViewQuery>,
handle: Option<ROQueryItem<'w, Self::ItemQuery>>,
line_gizmos: SystemParamItem<'w, '_, Self::Param>,
Expand Down Expand Up @@ -506,6 +508,7 @@ impl<P: PhaseItem> RenderCommand<P> for DrawLineJointGizmo {
#[inline]
fn render<'w>(
_item: &P,
_index: usize,
_view: ROQueryItem<'w, Self::ViewQuery>,
handle: Option<ROQueryItem<'w, Self::ItemQuery>>,
line_gizmos: SystemParamItem<'w, '_, Self::Param>,
Expand Down
4 changes: 4 additions & 0 deletions crates/bevy_pbr/src/deferred/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,10 @@ impl SpecializedRenderPipeline for DeferredLightingLayout {
shader_defs.push("MOTION_VECTOR_PREPASS".into());
}

if key.contains(MeshPipelineKey::INDIRECT) {
shader_defs.push("INDIRECT".into());
}

// Always true, since we're in the deferred lighting pipeline
shader_defs.push("DEFERRED_PREPASS".into());

Expand Down
3 changes: 3 additions & 0 deletions crates/bevy_pbr/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ pub mod graph {
/// Label for the screen space ambient occlusion render node.
ScreenSpaceAmbientOcclusion,
DeferredLightingPass,
/// Label for the GPU culling node.
GpuCull,
}
}

Expand Down Expand Up @@ -267,6 +269,7 @@ impl Plugin for PbrPlugin {
ExtractComponentPlugin::<ShadowFilteringMethod>::default(),
LightmapPlugin,
LightProbePlugin,
GpuCullPlugin,
))
.configure_sets(
PostUpdate,
Expand Down
13 changes: 9 additions & 4 deletions crates/bevy_pbr/src/material.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ use bevy_render::{
render_resource::*,
renderer::RenderDevice,
texture::FallbackImage,
view::{ExtractedView, Msaa, VisibleEntities},
view::{ExtractedView, GpuCulling, Msaa, VisibleEntities},
Extract,
};
use bevy_utils::{tracing::error, HashMap, HashSet};
Expand Down Expand Up @@ -396,6 +396,7 @@ impl<P: PhaseItem, M: Material, const I: usize> RenderCommand<P> for SetMaterial
#[inline]
fn render<'w>(
item: &P,
_index: usize,
_view: (),
_item_query: Option<()>,
(materials, material_instances): SystemParamItem<'w, '_, Self::Param>,
Expand Down Expand Up @@ -492,15 +493,16 @@ pub fn queue_material_meshes<M: Material>(
Has<DeferredPrepass>,
),
Option<&Camera3d>,
Has<TemporalJitter>,
Option<&Projection>,
&mut RenderPhase<Opaque3d>,
&mut RenderPhase<AlphaMask3d>,
&mut RenderPhase<Transmissive3d>,
&mut RenderPhase<Transparent3d>,
(
Has<TemporalJitter>,
Has<RenderViewLightProbes<EnvironmentMapLight>>,
Has<RenderViewLightProbes<IrradianceVolume>>,
Has<GpuCulling>,
),
)>,
) where
Expand All @@ -515,13 +517,12 @@ pub fn queue_material_meshes<M: Material>(
ssao,
(normal_prepass, depth_prepass, motion_vector_prepass, deferred_prepass),
camera_3d,
temporal_jitter,
projection,
mut opaque_phase,
mut alpha_mask_phase,
mut transmissive_phase,
mut transparent_phase,
(has_environment_maps, has_irradiance_volumes),
(temporal_jitter, has_environment_maps, has_irradiance_volumes, gpu_culling),
) in &mut views
{
let draw_opaque_pbr = opaque_draw_functions.read().id::<DrawMaterial<M>>();
Expand Down Expand Up @@ -560,6 +561,10 @@ pub fn queue_material_meshes<M: Material>(
view_key |= MeshPipelineKey::IRRADIANCE_VOLUME;
}

if gpu_culling {
view_key |= MeshPipelineKey::INDIRECT;
}

if let Some(projection) = projection {
view_key |= match projection {
Projection::Perspective(_) => MeshPipelineKey::VIEW_PROJECTION_PERSPECTIVE,
Expand Down
5 changes: 5 additions & 0 deletions crates/bevy_pbr/src/prepass/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,10 @@ where
shader_defs.push("MOTION_VECTOR_PREPASS".into());
}

if key.mesh_key.contains(MeshPipelineKey::INDIRECT) {
shader_defs.push("INDIRECT".into());
}

if key.mesh_key.intersects(
MeshPipelineKey::NORMAL_PREPASS
| MeshPipelineKey::MOTION_VECTOR_PREPASS
Expand Down Expand Up @@ -902,6 +906,7 @@ impl<P: PhaseItem, const I: usize> RenderCommand<P> for SetPrepassViewBindGroup<
#[inline]
fn render<'w>(
_item: &P,
_index: usize,
(view_uniform_offset, previous_view_projection_uniform_offset): (
&'_ ViewUniformOffset,
Option<&'_ PreviousViewProjectionUniformOffset>,
Expand Down
23 changes: 9 additions & 14 deletions crates/bevy_pbr/src/prepass/prepass.wgsl
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,14 @@ fn vertex(vertex_no_morph: Vertex) -> VertexOutput {
var vertex = vertex_no_morph;
#endif

// Use vertex_no_morph.instance_index instead of vertex.instance_index to work around a wgpu dx12 bug.
// See https://github.com/gfx-rs/naga/issues/2416
let instance_index = mesh_functions::get_mesh_instance_index(vertex_no_morph.instance_index);

#ifdef SKINNED
var model = skinning::skin_model(vertex.joint_indices, vertex.joint_weights);
#else // SKINNED
// Use vertex_no_morph.instance_index instead of vertex.instance_index to work around a wgpu dx12 bug.
// See https://github.com/gfx-rs/naga/issues/2416
var model = mesh_functions::get_model_matrix(vertex_no_morph.instance_index);
var model = mesh_functions::get_model_matrix(instance_index);
#endif // SKINNED

out.position = mesh_functions::mesh_position_local_to_clip(model, vec4(vertex.position, 1.0));
Expand All @@ -68,21 +70,14 @@ fn vertex(vertex_no_morph: Vertex) -> VertexOutput {
#ifdef SKINNED
out.world_normal = skinning::skin_normals(model, vertex.normal);
#else // SKINNED
out.world_normal = mesh_functions::mesh_normal_local_to_world(
vertex.normal,
// Use vertex_no_morph.instance_index instead of vertex.instance_index to work around a wgpu dx12 bug.
// See https://github.com/gfx-rs/naga/issues/2416
vertex_no_morph.instance_index
);
out.world_normal = mesh_functions::mesh_normal_local_to_world(vertex.normal, instance_index);
#endif // SKINNED

#ifdef VERTEX_TANGENTS
out.world_tangent = mesh_functions::mesh_tangent_local_to_world(
model,
vertex.tangent,
// Use vertex_no_morph.instance_index instead of vertex.instance_index to work around a wgpu dx12 bug.
// See https://github.com/gfx-rs/naga/issues/2416
vertex_no_morph.instance_index
instance_index
);
#endif // VERTEX_TANGENTS
#endif // NORMAL_PREPASS_OR_DEFERRED_PREPASS
Expand All @@ -97,15 +92,15 @@ fn vertex(vertex_no_morph: Vertex) -> VertexOutput {
// Use vertex_no_morph.instance_index instead of vertex.instance_index to work around a wgpu dx12 bug.
// See https://github.com/gfx-rs/naga/issues/2416
out.previous_world_position = mesh_functions::mesh_position_local_to_world(
mesh_functions::get_previous_model_matrix(vertex_no_morph.instance_index),
mesh_functions::get_previous_model_matrix(instance_index),
vec4<f32>(vertex.position, 1.0)
);
#endif // MOTION_VECTOR_PREPASS

#ifdef VERTEX_OUTPUT_INSTANCE_INDEX
// Use vertex_no_morph.instance_index instead of vertex.instance_index to work around a wgpu dx12 bug.
// See https://github.com/gfx-rs/naga/issues/2416
out.instance_index = vertex_no_morph.instance_index;
out.instance_index = instance_index;
#endif

return out;
Expand Down
Loading

0 comments on commit 8ebfd67

Please # to comment.