-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
vello_hybrid
implementation
#831
base: main
Are you sure you want to change the base?
vello_hybrid
implementation
#831
Conversation
This brings in the cpu-sparse prototype from the piet-next branch of the piet repo. No substantive changes, but cpu-sparse is renamed vello_hybrid and piet-next is renamed vello_api. Quite a bit of editing to satisfy the lint monster. There was a half-written SIMD implementation of flattening, that's removed. It should be finished and re-added, as it's a good speedup.
Renders a simple scene to the GPU, first by doing coarse rasterization the same as cpu-sparse, then doing a single draw call.
Adds a clip method to the (CPU) render context, plus a considerable amount of mechanism in coarse and fine rasterization to support clipping. The coarse rasterization logic contains a similar set of optimizations as Vello. In particular, all-zero tiles have drawing suppressed, and all-one tiles pass drawing commands through with no additional work to clip. Not extensively validated, but it does render a simple scene with clipping correctly.
This reverts commit 248d08d536eb818f21020c753465469741e73d6c.
…s rendering support
25e9650
to
fba44cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is running into gfx-rs/wgpu#6997. I've put a note on the mentioned backport into the wgpu maintainer meeting notes, and it doesn't actually break anything (it only means we don't get a clean shutdown)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! 🙏 I’m wondering which platform you encountered the issue on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue occurs on Wayland.
let strip = GpuStrip { | ||
x: tile_x as u16, | ||
y: tile_y as u16, | ||
width: WIDE_TILE_WIDTH as u16, | ||
dense_width: 0, | ||
col: 0, | ||
rgba: bg, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really matter at this stage, but to be robust for the rare case where we're also running on big-endian hosts, these should be u16::from_ne_bytes((tile_x as u16).to_le_bytes())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing that up! Would you prefer me to add it now, or wait until the issue arises so that we can fix it along with any other potential related cases?
width: cmd_strip.width as u16, | ||
dense_width: cmd_strip.width as u16, | ||
col: cmd_strip.alpha_ix as u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current commands, width
and dense_width
are always either the same, or dense_width
is 0. If that stays this way, the absence of an alpha mask could be encoded as, say, col: u32::MAX
. On the other hand, your current encoding would allow merging alpha fills and sparse fills into one command, meanings paths would be drawn with at most two vertex instances per row per wide tile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, your current encoding would allow merging alpha fills and sparse fills into one command, meanings paths would be drawn with at most two vertex instances per row per wide tile
Could you clarify this part? Are you referring to a more advanced batching logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current preference would be to have a:
#[cfg(all(not(vello_big_endian_unchecked), target_endian = "big"))]
compile_fail!("Vello currently does not support big endian targets. Enable the `vello_big_endian_unchecked` cfg flag to try and run it anyway.");
This is however better as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify this part? Are you referring to a more advanced batching logic?
It could be done in batching, but the logic may fit better in command generation (after strip rendering). The idea is that if you're encoding width
and dense_width
for draw commands, you can fold CmdFill
and CmdAlphaFill
into a single command. That would reduce the number of vertex instances required to often be two per path per wide tile, reducing uploads to the GPU. You could go one further and encode a dense_offset
as well, often requiring just one vertex instance per wide tile for drawing paths.
I've added a note to the renderer office hours agenda.
width: fill.width as u16, | ||
dense_width: 0, | ||
col: 0, | ||
rgba: color.to_rgba8().to_u32(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be the following.
rgba: color.to_rgba8().to_u32(), | |
rgba: color.premultiply().to_rgba8().to_u32(), |
The same for the AlphaFill
command below and the bg
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Fixed in 4485f2e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the pixel coverage of the whiskers (and pixel RGBA values of those), I believe the image is missing unpremultiplication.
Fixed in cecfb41
|
||
// Create initial texture for alpha values | ||
// It will be recreated if needed in prepare | ||
let initial_alpha_texture_width = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense.
Fixed in fa1d947
// Create initial texture for alpha values | ||
// It will be recreated if needed in prepare | ||
let initial_alpha_texture_width = 64; | ||
let initial_alpha_texture_height = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could pass the count of processed strips when initializing Renderer, but I’m a bit hesitant to tie Renderer::new
directly to the data. Since we have prepare
method, it might be a better place to consolidate all resource management logic. What do you think?
Something like in that 8f06569 commit
self.queue.write_buffer( | ||
&self.strips_buffer, | ||
0, | ||
bytemuck::cast_slice(&render_data.strips), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree, we should explore that in a separate PR. I suspect it will provide a greater performance boost as strips_buffer
grows larger. However, I’d like to gather some concrete data, so my initial focus will be on introducing performance measurements and then we can explore that. How does it sound?
Added TODO in 2c3de83
assert!( | ||
alpha_texture_height <= max_texture_dimension_2d, | ||
"Alpha texture height exceeds WebGL2 limit" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! Yeah, this approach increases storage by a factor of 4. Now, I’m using the Rgba32Uint
texture format, which efficiently packs all u32 alpha values. I checked the WebGL2 spec (search for RGBA32UI
), and it looks like this format is supported.
Fixed in beebbc0
/// Stroke a path with the current paint and stroke settings. | ||
pub fn stroke_path(&mut self, path: &BezPath) { | ||
flatten::stroke(path, &self.stroke, self.transform, &mut self.line_buf); | ||
self.render_path(Fill::NonZero, self.paint.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right, and you're correct that this approach has implications for batching. The current architecture processes and finalizes each path independently. In a more optimized renderer, we might batch similar paint operations to reduce state changes and improve GPU utilization. However, this would require more complex state management and potentially more memory usage to track all paths before rendering. Also, for the web environment, we are focusing on the simplest solution, which is single-threaded.
let strip = GpuStrip { | ||
x: tile_x as u16, | ||
y: tile_y as u16, | ||
width: WIDE_TILE_WIDTH as u16, | ||
dense_width: 0, | ||
col: 0, | ||
rgba: bg, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing that up! Would you prefer me to add it now, or wait until the issue arises so that we can fix it along with any other potential related cases?
width: fill.width as u16, | ||
dense_width: 0, | ||
col: 0, | ||
rgba: color.to_rgba8().to_u32(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Fixed in 4485f2e
width: cmd_strip.width as u16, | ||
dense_width: cmd_strip.width as u16, | ||
col: cmd_strip.alpha_ix as u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, your current encoding would allow merging alpha fills and sparse fills into one command, meanings paths would be drawn with at most two vertex instances per row per wide tile
Could you clarify this part? Are you referring to a more advanced batching logic?
// Fallback, should never happen | ||
default: { return rgba.x; } | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a final newline. You can normally configure your editor to add these automatically.
Unfortunately, I'm not aware of a good formatter for WGSL files, so this can't really be checked on CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my bad! I missed that.
You can normally configure your editor to add these automatically.
Yeah, I recently switched to vscode for some projects and haven’t configured everything yet.
Fixed in 93038ab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've mentioned a blocking concern in #gpu > Vello Hybrid (of incorrect rendering after the first frame). Once we get that resolved, I'm happy for this to land from my perspective.
The GPU resource management story is something that makes me quite sad (because the existing state in Vello is not great, and that's been brought in here); there are quite a few changes I want to make here. However, those are better as follow-ups.
} | ||
|
||
/// Fill a rectangle with the current paint and fill rule. | ||
pub fn fill_rect(&mut self, rect: &Rect) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we re-expect kurbo so that consumers don't need to pull it in themselves? e.g.
pub use peniko::kurbo;
} | ||
|
||
/// Prepare the GPU buffers for rendering | ||
pub fn prepare( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn prepare( | |
fn prepare( |
Currently this is only called in render_to_texture
. Maybe we should keep it private until we think it should be exposed?
|
||
/// Options for the renderer | ||
#[derive(Debug)] | ||
pub struct RendererOptions {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT of providing a default implementation for RendererOptions?
Based on #818: This PR adds a new hybrid sparse strip CPU/GPU renderer (
vello_hybrid
). The CPU handles path processing and initial geometry setup, while the GPU accelerates compositing and blending operations. The architecture consists ofRenderContext
for rendering operations,Renderer
for GPU resource management, andRenderData
container for processed geometry. The implementation supports both windowed and headless rendering, includes various examples, and moves some common functionality to shared modules.# Render svg to file cargo run -p vello_hybrid --example render_to_file examples/assets/Ghostscript_Tiger.svg target/Ghostscript_Tiger_VH.png
# Render svg to window cargo run -p vello_hybrid --example simple cargo run -p vello_hybrid --example svg examples/assets/Ghostscript_Tiger.svg