`vello_hybrid` implementation #831

grebmeg · 2025-03-05T05:04:08Z

Based on #818: This PR adds a new hybrid sparse strip CPU/GPU renderer (vello_hybrid). The CPU handles path processing and initial geometry setup, while the GPU accelerates compositing and blending operations. The architecture consists of RenderContext for rendering operations, Renderer for GPU resource management, and RenderData container for processed geometry. The implementation supports both windowed and headless rendering, includes various examples, and moves some common functionality to shared modules.

# Render svg to file
cargo run -p vello_hybrid --example render_to_file examples/assets/Ghostscript_Tiger.svg target/Ghostscript_Tiger_VH.png

# Render svg to window
cargo run -p vello_hybrid --example simple
cargo run -p vello_hybrid --example svg examples/assets/Ghostscript_Tiger.svg

sparse_strips/vello_hybrid/src/simd.rs

sparse_strips/vello_hybrid/src/strip.rs

This brings in the cpu-sparse prototype from the piet-next branch of the piet repo. No substantive changes, but cpu-sparse is renamed vello_hybrid and piet-next is renamed vello_api. Quite a bit of editing to satisfy the lint monster. There was a half-written SIMD implementation of flattening, that's removed. It should be finished and re-added, as it's a good speedup.

Renders a simple scene to the GPU, first by doing coarse rasterization the same as cpu-sparse, then doing a single draw call.

Adds a clip method to the (CPU) render context, plus a considerable amount of mechanism in coarse and fine rasterization to support clipping. The coarse rasterization logic contains a similar set of optimizations as Vello. In particular, all-zero tiles have drawing suppressed, and all-one tiles pass drawing commands through with no additional work to clip. Not extensively validated, but it does render a simple scene with clipping correctly.

This reverts commit 248d08d536eb818f21020c753465469741e73d6c.

…atibility

…s rendering support

DJMcNab · 2025-03-11T11:12:52Z

sparse_strips/vello_hybrid/examples/simple.rs

This example is running into gfx-rs/wgpu#6997. I've put a note on the mentioned backport into the wgpu maintainer meeting notes, and it doesn't actually break anything (it only means we don't get a clean shutdown)

Thank you! 🙏 I’m wondering which platform you encountered the issue on?

This issue occurs on Wayland.

sparse_strips/vello_hybrid/examples/svg.rs

tomcur · 2025-03-11T11:32:05Z

sparse_strips/vello_hybrid/src/render.rs

+                    let strip = GpuStrip {
+                        x: tile_x as u16,
+                        y: tile_y as u16,
+                        width: WIDE_TILE_WIDTH as u16,
+                        dense_width: 0,
+                        col: 0,
+                        rgba: bg,
+                    };


It doesn't really matter at this stage, but to be robust for the rare case where we're also running on big-endian hosts, these should be u16::from_ne_bytes((tile_x as u16).to_le_bytes()).

Thank you for bringing that up! Would you prefer me to add it now, or wait until the issue arises so that we can fix it along with any other potential related cases?

tomcur · 2025-03-11T11:59:40Z

sparse_strips/vello_hybrid/src/render.rs

+                                width: cmd_strip.width as u16,
+                                dense_width: cmd_strip.width as u16,
+                                col: cmd_strip.alpha_ix as u32,


With the current commands, width and dense_width are always either the same, or dense_width is 0. If that stays this way, the absence of an alpha mask could be encoded as, say, col: u32::MAX. On the other hand, your current encoding would allow merging alpha fills and sparse fills into one command, meanings paths would be drawn with at most two vertex instances per row per wide tile.

On the other hand, your current encoding would allow merging alpha fills and sparse fills into one command, meanings paths would be drawn with at most two vertex instances per row per wide tile

Could you clarify this part? Are you referring to a more advanced batching logic?

My current preference would be to have a:

#[cfg(all(not(vello_big_endian_unchecked), target_endian = "big"))] compile_fail!("Vello currently does not support big endian targets. Enable the `vello_big_endian_unchecked` cfg flag to try and run it anyway.");

This is however better as a follow-up.

Could you clarify this part? Are you referring to a more advanced batching logic?

It could be done in batching, but the logic may fit better in command generation (after strip rendering). The idea is that if you're encoding width and dense_width for draw commands, you can fold CmdFill and CmdAlphaFill into a single command. That would reduce the number of vertex instances required to often be two per path per wide tile, reducing uploads to the GPU. You could go one further and encode a dense_offset as well, often requiring just one vertex instance per wide tile for drawing paths.

I've added a note to the renderer office hours agenda.

tomcur · 2025-03-11T12:26:10Z

sparse_strips/vello_hybrid/src/render.rs

+                                width: fill.width as u16,
+                                dense_width: 0,
+                                col: 0,
+                                rgba: color.to_rgba8().to_u32(),


I believe this should be the following.

Suggested change

rgba: color.to_rgba8().to_u32(),

rgba: color.premultiply().to_rgba8().to_u32(),

The same for the AlphaFill command below and the bg above.

Thanks!

Fixed in 4485f2e

tomcur · 2025-03-11T12:47:33Z

Looking at the pixel coverage of the whiskers (and pixel RGBA values of those), I believe the image is missing unpremultiplication.

grebmeg

Looking at the pixel coverage of the whiskers (and pixel RGBA values of those), I believe the image is missing unpremultiplication.

Fixed in cecfb41

grebmeg · 2025-03-11T10:56:28Z

sparse_strips/vello_hybrid/src/gpu.rs

+
+        // Create initial texture for alpha values
+        // It will be recreated if needed in prepare
+        let initial_alpha_texture_width = 64;


Yes, that makes sense.

Fixed in fa1d947

sparse_strips/vello_hybrid/shaders/sparse_strip_renderer.wgsl

grebmeg · 2025-03-11T23:39:54Z

sparse_strips/vello_hybrid/src/gpu.rs

+        // Create initial texture for alpha values
+        // It will be recreated if needed in prepare
+        let initial_alpha_texture_width = 64;
+        let initial_alpha_texture_height = 64;


Yes, we could pass the count of processed strips when initializing Renderer, but I’m a bit hesitant to tie Renderer::new directly to the data. Since we have prepare method, it might be a better place to consolidate all resource management logic. What do you think?

Something like in that 8f06569 commit

grebmeg · 2025-03-12T00:32:47Z

sparse_strips/vello_hybrid/src/gpu.rs

+        self.queue.write_buffer(
+            &self.strips_buffer,
+            0,
+            bytemuck::cast_slice(&render_data.strips),
+        );


Yes, I agree, we should explore that in a separate PR. I suspect it will provide a greater performance boost as strips_buffer grows larger. However, I’d like to gather some concrete data, so my initial focus will be on introducing performance measurements and then we can explore that. How does it sound?

Added TODO in 2c3de83

grebmeg · 2025-03-12T02:21:18Z

sparse_strips/vello_hybrid/src/gpu.rs

+            assert!(
+                alpha_texture_height <= max_texture_dimension_2d,
+                "Alpha texture height exceeds WebGL2 limit"
+            );


Thanks for the suggestion! Yeah, this approach increases storage by a factor of 4. Now, I’m using the Rgba32Uint texture format, which efficiently packs all u32 alpha values. I checked the WebGL2 spec (search for RGBA32UI), and it looks like this format is supported.

Fixed in beebbc0

grebmeg · 2025-03-12T02:32:26Z

sparse_strips/vello_hybrid/src/render.rs

+    /// Stroke a path with the current paint and stroke settings.
+    pub fn stroke_path(&mut self, path: &BezPath) {
+        flatten::stroke(path, &self.stroke, self.transform, &mut self.line_buf);
+        self.render_path(Fill::NonZero, self.paint.clone());


Yes, that's right, and you're correct that this approach has implications for batching. The current architecture processes and finalizes each path independently. In a more optimized renderer, we might batch similar paint operations to reduce state changes and improve GPU utilization. However, this would require more complex state management and potentially more memory usage to track all paths before rendering. Also, for the web environment, we are focusing on the simplest solution, which is single-threaded.

grebmeg · 2025-03-12T02:48:14Z

sparse_strips/vello_hybrid/src/render.rs

+                    let strip = GpuStrip {
+                        x: tile_x as u16,
+                        y: tile_y as u16,
+                        width: WIDE_TILE_WIDTH as u16,
+                        dense_width: 0,
+                        col: 0,
+                        rgba: bg,
+                    };


Thank you for bringing that up! Would you prefer me to add it now, or wait until the issue arises so that we can fix it along with any other potential related cases?

grebmeg · 2025-03-12T03:07:00Z

sparse_strips/vello_hybrid/src/render.rs

+                                width: fill.width as u16,
+                                dense_width: 0,
+                                col: 0,
+                                rgba: color.to_rgba8().to_u32(),


Thanks!

Fixed in 4485f2e

grebmeg · 2025-03-12T04:06:33Z

sparse_strips/vello_hybrid/src/render.rs

+                                width: cmd_strip.width as u16,
+                                dense_width: cmd_strip.width as u16,
+                                col: cmd_strip.alpha_ix as u32,


On the other hand, your current encoding would allow merging alpha fills and sparse fills into one command, meanings paths would be drawn with at most two vertex instances per row per wide tile

Could you clarify this part? Are you referring to a more advanced batching logic?

…-hybrid-clean-up

DJMcNab · 2025-03-12T09:11:45Z

sparse_strips/vello_hybrid/shaders/sparse_strip_renderer.wgsl

+        // Fallback, should never happen
+        default: { return rgba.x; }
+    }
+}


This should have a final newline. You can normally configure your editor to add these automatically.

Unfortunately, I'm not aware of a good formatter for WGSL files, so this can't really be checked on CI.

Oh, my bad! I missed that.

You can normally configure your editor to add these automatically.

Yeah, I recently switched to vscode for some projects and haven’t configured everything yet.

Fixed in 93038ab

…ring

DJMcNab

I've mentioned a blocking concern in #gpu > Vello Hybrid (of incorrect rendering after the first frame). Once we get that resolved, I'm happy for this to land from my perspective.

The GPU resource management story is something that makes me quite sad (because the existing state in Vello is not great, and that's been brought in here); there are quite a few changes I want to make here. However, those are better as follow-ups.

taj-p · 2025-03-12T21:58:59Z

sparse_strips/vello_hybrid/src/scene.rs

+    }
+
+    /// Fill a rectangle with the current paint and fill rule.
+    pub fn fill_rect(&mut self, rect: &Rect) {


Should we re-expect kurbo so that consumers don't need to pull it in themselves? e.g.

pub use peniko::kurbo;

taj-p · 2025-03-12T22:03:20Z

sparse_strips/vello_hybrid/src/render.rs

+    }
+
+    /// Prepare the GPU buffers for rendering
+    pub fn prepare(


Suggested change

pub fn prepare(

fn prepare(

Currently this is only called in render_to_texture. Maybe we should keep it private until we think it should be exposed?

taj-p · 2025-03-12T22:03:40Z

sparse_strips/vello_hybrid/src/render.rs

+
+/// Options for the renderer
+#[derive(Debug)]
+pub struct RendererOptions {}


WDYT of providing a default implementation for RendererOptions?

DJMcNab reviewed Mar 5, 2025

View reviewed changes

sparse_strips/vello_hybrid/src/simd.rs Outdated Show resolved Hide resolved

sparse_strips/vello_hybrid/src/strip.rs Outdated Show resolved Hide resolved

raphlinus and others added 29 commits March 10, 2025 18:42

Fix lints in non-aarch64 cfg's

55f9528

Start wiring up GPU render pipeline

2ba526b

Renders a simple scene to the GPU, first by doing coarse rasterization the same as cpu-sparse, then doing a single draw call.

Add missing file, fix lints

32a4b5d

Remove vello_hybrid lib.rs file

a742987

Move vello_api and vello_hybrid to sparse_strips

e300ab2

Move vello_hybrid-specific code from vello_api to vello_hybrid

807da60

Refactor vello_hybrid to use internal API module

6d2aaaa

Remove SIMD implementation

a637fef

Use concrete API implementation instead of trait-based

294d895

Revert "Implement basic clip logic"

43b7640

This reverts commit 248d08d536eb818f21020c753465469741e73d6c.

refact: use vello_common

fad44f5

refactor: simplify GPU rendering context and buffer preparation

b2a9a3c

refact: move PicoSvg into pico_svg module

da921ee

split svg example into two svg_cpu and svg_gpu examples

7a76ca2

feat: add PicoSvg parsing and color parsing tests

06e6ae9

.

42f08f2

.

89a4ef2

feat: add winit-based GPU rendering to SVG example

570c0ae

fix: correct color format

9b0e742

.

5172e37

refactor: migrate strips and alpha data to 2D textures for WebGL comp…

60a426f

…atibility

feat: add performance measurement feature for GPU rendering

1e0a354

refactor: switch from texture-based to vertex instance buffer rendering

684c078

refactor: enhance rendering flexibility with RenderTarget and headles…

1525be7

…s rendering support

refactor: simplify project structure and remove unused modules

8ac8d28

refactor: improve documentation and code clarity across modules

9955c02

refactor: move buffer management from renderer initialization to prepare

e9edea0

grebmeg force-pushed the vello-sparse-strips-hybrid-clean-up branch from 25e9650 to fba44cd Compare March 11, 2025 10:51

refactor: optimize alpha texture initialization and resizing

fa1d947

DJMcNab reviewed Mar 11, 2025

View reviewed changes

sparse_strips/vello_hybrid/examples/svg.rs Show resolved Hide resolved

tomcur reviewed Mar 11, 2025

View reviewed changes

grebmeg added 6 commits March 12, 2025 11:18

refactor: consolidate all resource management logic into prepare method

8f06569

perf: add TODO for potential buffer write optimization

2c3de83

refactor: use Rgba32Uint texture format to compactly store alphas

beebbc0

fix: premultiply alpha for color conversion in prepare_render_data

4485f2e

fix: unpremultiply alpha after rendering to pixmap

cecfb41

refactor: improve variable naming

eb354b5

grebmeg commented Mar 12, 2025

View reviewed changes

grebmeg added 4 commits March 12, 2025 15:18

Merge remote-tracking branch 'upstream/main' into vello-sparse-strips…

f25dd25

…-hybrid-clean-up

fix: pass width and height to make_tiles

949d228

chore: Remove unused features section from Cargo.toml

9bb8b97

perf: optimize texture coordinate calculation using bitwise operations

3d4ed71

DJMcNab reviewed Mar 12, 2025

View reviewed changes

grebmeg added 3 commits March 12, 2025 20:38

refactor: use to user-provided texture, instance and device for rende…

a2c2ac6

…ring

fix: adding trailing newline in sparse strip renderer shader

93038ab

refactor: reorganize imports and correct doc error

08cf54f

grebmeg requested review from DJMcNab, taj-p and tomcur March 12, 2025 10:05

DJMcNab reviewed Mar 12, 2025

View reviewed changes

taj-p reviewed Mar 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`vello_hybrid` implementation #831

`vello_hybrid` implementation #831

grebmeg commented Mar 5, 2025 •

edited

Loading

DJMcNab Mar 11, 2025

grebmeg Mar 12, 2025

DJMcNab Mar 12, 2025

tomcur Mar 11, 2025 •

edited

Loading

grebmeg Mar 12, 2025

tomcur Mar 11, 2025

grebmeg Mar 12, 2025

DJMcNab Mar 12, 2025

tomcur Mar 12, 2025 •

edited

Loading

tomcur Mar 11, 2025

grebmeg Mar 12, 2025

tomcur commented Mar 11, 2025 •

edited

Loading

grebmeg left a comment

grebmeg Mar 11, 2025

grebmeg Mar 11, 2025

grebmeg Mar 12, 2025

grebmeg Mar 12, 2025

grebmeg Mar 12, 2025

grebmeg Mar 12, 2025

grebmeg Mar 12, 2025

grebmeg Mar 12, 2025

DJMcNab Mar 12, 2025

grebmeg Mar 12, 2025

DJMcNab left a comment •

edited

Loading

taj-p Mar 12, 2025 •

edited

Loading

taj-p Mar 12, 2025

taj-p Mar 12, 2025

	rgba: color.to_rgba8().to_u32(),
	rgba: color.premultiply().to_rgba8().to_u32(),

vello_hybrid implementation #831

Are you sure you want to change the base?

vello_hybrid implementation #831

Conversation

grebmeg commented Mar 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomcur Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomcur Mar 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomcur commented Mar 11, 2025 • edited Loading

grebmeg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DJMcNab left a comment • edited Loading

Choose a reason for hiding this comment

taj-p Mar 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

`vello_hybrid` implementation #831

`vello_hybrid` implementation #831

grebmeg commented Mar 5, 2025 •

edited

Loading

tomcur Mar 11, 2025 •

edited

Loading

tomcur Mar 12, 2025 •

edited

Loading

tomcur commented Mar 11, 2025 •

edited

Loading

DJMcNab left a comment •

edited

Loading

taj-p Mar 12, 2025 •

edited

Loading