Remove line packing from tiles, slightly improve tile culling #845

tomcur · 2025-03-10T14:52:46Z

This reduces the tile memory footprint and drops the requirement for tile x-coordinates to be signed. We may be able to drop the TileIndex indirection soon.

Coarse winding calculation is performed in tile generation, like it more-or-less was before. In principle it requires just a single bit in the tile packing. Calculating the coarse winding could be performed in the strip generation as well, but I suspect it to be slightly more performant this way. I haven't accurately measured that.

This is one step towards what Raph mentioned. Quoting:

pack x, y, line id, and winding number delta into a single u32 and sort that. This is a very tight packing with u32 and may overflow in some cases (very large number of lines per path), but I think should be very efficient. Then the determination of p0 and p1 within the tile happens after sorting. A major motivation for this change is that it can be done on GPU in hybrid modes, but the approach to parallelism also works with SIMD, I think.

Tiles above, to the right and below the viewport are now culled. Tiles to the left of the viewport are clamped but not culled yet. Currently strip generation has some code special-casing those tiles.

On my machine, paris-30k measures with #835 as the "before" and after the changes here as the following. (With 20 iterations, appears to be accurate to about ~0.2ms, but we should think about better benchmarking.)

Tile generation
    old: 24.394854ms
    new: 17.620834ms
Tile sorting
    old: 8.922622ms
    new: 8.146455ms
Strip generation
    old: 33.187645ms
    new: 31.825052ms

tomcur · 2025-03-10T15:01:45Z

sparse_strips/vello_common/src/strip.rs

+        let line = lines[tile.line_idx as usize];
+        let tile_left_x = tile.x as f32 * Tile::WIDTH as f32;
+        let tile_top_y = tile.y as f32 * Tile::HEIGHT as f32;
+        let p0_x = line.p0.x - tile_left_x;
+        let p0_y = line.p0.y - tile_top_y;
+        let p1_x = line.p1.x - tile_left_x;
+        let p1_y = line.p1.y - tile_top_y;


With these calculations placed at the front of the loop, strip generation measures at ~1-2% faster for paris-30k on my machine.

tomcur · 2025-03-10T15:02:39Z

sparse_strips/vello_common/src/tile.rs

+    /// Whether the line crosses the top edge of the tile.
+    ///
+    /// Lines making this crossing increment or decrement the coarse tile winding, depending on the
+    /// line direction.
+    pub winding: bool,


This can become a single bit in a tight packing.

Before, we had a delta which can be -1, 0, 1. Am I correct in the assumption that winding == true corresponds to either -1/1, the exact delta being determined after the sorting?

Yes, false corresponds to 0, and true corresponds to -1 if the line is oriented down and 1 if the line is oriented up. In this PR the sign is resolved during strip generation.

Oooh, that's a neat trick, I love it. I was thinking along similar lines for the MSAA winding number accumulation in the sparse strip design (see doc).

tomcur · 2025-03-10T15:05:11Z

sparse_strips/vello_common/src/tile.rs


 /// A tile represents an aligned area on the pixmap, used to subdivide the viewport into sub-areas
 /// (currently 4x4) and analyze line intersections inside each such area.
 ///
 /// Keep in mind that it is possible to have multiple tiles with the same index,
 /// namely if we have multiple lines crossing the same 4x4 area!
-#[derive(Debug, Clone)]
+#[derive(Debug, Clone, Copy)]
 pub struct Tile {
    /// The index of the tile in the x direction.
    pub x: i32,


This should become u16, probably as part of this PR, which I'd prefer doing as a separate commit (many non-interesting changes will follow from that). With this PR currently being based on top of another PR, I'll hold off on that for a bit to keep things clear.

The end goal is that a Tile can be represented by just u32, right? This would mean that we need to first check whether any of the components can overflow, and if so use a "long code path" that doesn't contain this tight packing, right? In practice this would hopefully never be necessary unless there are a lot of lines in a single path or x/y are very big, but for correctness it would still be necessary to have.

I think this is best answered by @raphlinus: I had the same question.

Yes, you'll overflow the u32 packing if the line count exceeds the space available, which I think will be rare, but there would need to be a fallback as it's not impossible.

I guess a major empirical question is whether sorting of u32's is actually much faster than u64's. If not, then it's probably better to avoid the complexity of having two code paths. This question is potentially going to depend on a lot of things, like the architecture it's running on and whether we try to do alternative sorting strategies like radix sort.

LaurenzV

I think the proper review will wait until the previous PR has been merged for easiniess, but just a couple of (perhaps dumb) questions from my side. Otherwise, the overall direction seems sensible to me!

LaurenzV · 2025-03-10T15:43:27Z

sparse_strips/vello_common/src/tile.rs


 /// A tile represents an aligned area on the pixmap, used to subdivide the viewport into sub-areas
 /// (currently 4x4) and analyze line intersections inside each such area.
 ///
 /// Keep in mind that it is possible to have multiple tiles with the same index,
 /// namely if we have multiple lines crossing the same 4x4 area!
-#[derive(Debug, Clone)]
+#[derive(Debug, Clone, Copy)]
 pub struct Tile {
    /// The index of the tile in the x direction.
    pub x: i32,


The end goal is that a Tile can be represented by just u32, right? This would mean that we need to first check whether any of the components can overflow, and if so use a "long code path" that doesn't contain this tight packing, right? In practice this would hopefully never be necessary unless there are a lot of lines in a single path or x/y are very big, but for correctness it would still be necessary to have.

LaurenzV · 2025-03-10T15:46:01Z

sparse_strips/vello_common/src/tile.rs

+    /// Whether the line crosses the top edge of the tile.
+    ///
+    /// Lines making this crossing increment or decrement the coarse tile winding, depending on the
+    /// line direction.
+    pub winding: bool,


Before, we had a delta which can be -1, 0, 1. Am I correct in the assumption that winding == true corresponds to either -1/1, the exact delta being determined after the sorting?

LaurenzV · 2025-03-10T15:46:58Z

sparse_strips/vello_common/src/tile.rs

-        let nudge_point = |p: Point| -> Point {
-            // Lines that cross vertical tile boundaries need special treatment during
-            // anti-aliasing. This case is detected via tile-relative x == 0. However,
-            // lines can naturally start or end at a multiple of the 4x4 grid, too, but
-            // these don't constitute crossings. We nudge these points ever so slightly,
-            // by ensuring that xfrac0 and xfrac1 are always at least  1/8192 of a pixel.
-            // By doing so, whenever we encounter a point
-            // at a tile relative 0, we can treat it as an edge crossing. This is somewhat
-            // of a hack and in theory we should rather solve the underlying issue in the
-            // strip generation code, but it works for now.
-            if p.x.fract() == 0.0 {
-                Point {
-                    x: p.x + SCALED_X_NUDGE_FACTOR,
-                    y: p.y,
-                }
-            } else {
-                p
-            }
-        };


All of this is not necessary anymore due to your new strip generation algorithm?

Indeed. I've attempted to encode as many of the edge-cases into IEEE 754 float semantics as I could (mostly around inf and nan), to have as few branches and instructions as possible. It could perhaps be improved more, but I haven't found that yet.

LaurenzV · 2025-03-10T15:49:59Z

sparse_strips/vello_common/src/tile.rs

+    // TODO: Tiles are clamped to the left edge of the viewport, but lines fully to the left of the
+    // viewport are not culled yet. These lines impact winding, and would need forwarding of
+    // winding to the strip generation stage.


Above you mentioned x can be stored as u16, so how do you distinguish between lines at 0 and lines strictly to the left?

This PR adds logic to strip generation to check for left-of-viewport lines. With some plumbing of winding/coverage from earlier pipeline stages into strip generation, those checks are best moved back to tile generation in a follow-up PR. I believe Raph would like to cull some geometry even earlier than tile generation in future changes, which might be able to use some of that same plumbing.

LaurenzV · 2025-03-10T15:51:42Z

sparse_strips/vello_common/src/tile.rs

+                (p1_y, p1_x, p0_y, p0_x)
+            };
+
+            // For ease of logic, special-case purely vertical tiles.


Are purely horizontal lines not worth special-casing?

They are worth special-casing. Horizontal geometry is parallel to the winding scan direction, and so doesn't impact winding. We should evaluate removing them entirely. That does most likely entail adding a sparse alpha mask draw command to the wide tiles.

LaurenzV · 2025-03-10T15:52:05Z

sparse_strips/vello_common/src/tile.rs

-
-                    last_packed = Point::new(packed.x + flip, packed.y);
+            } else {
+                let x_slope = (p1_x - p0_x) / (p1_y - p0_y);


Can this not be NaN for horizontal lines?

No, but it can be inf. The min and max below take care of that. Admittedly, I haven't verified that too closely since horizontal lines don't impact geometry and can be elided, which could be done shortly after this lands in a follow-up PR.

tomcur · 2025-03-10T16:12:05Z

Spot-checking some pixels, the changed tests all appear to just be the rounding of pixel coverage falling differently. I haven't checked too closely, but it's possible the area coverage calculation is by coincidence slightly more symmetrical now, check e.g. the left and right edges of eo_filling_missing_anti_aliasing.png before and after these changes.

tomcur · 2025-03-10T16:40:58Z

sparse_strips/vello_toy/src/debug.rs

+        let line = line_buf[tile.line_idx as usize];
+
+        // TODO: how to handle line intersections now lines are not explicitly segmented by tile
+        // generation anymore?
+        let p0 = Point {
+            x: line.p0.x - x as f32,
+            y: line.p0.y - y as f32,
+        };
+        let p1 = Point {
+            x: line.p1.x - x as f32,
+            y: line.p1.y - y as f32,
+        };


This is a bit hacky at the moment to get tests to pass. Lines are no longer segmented per tile explicitly, so this is drawing the same line multiple tiles. @LaurenzV do you have some ideas about this? Segmenting here manually might be a bit misleading, as with the code in this PR that doesn't actually happen internally.

I'm fine with removing the tile intersection points if the underlying algorithm is different now.

I've removed the intersections, but am opening a follow-up soon after this.

raphlinus

This looks good, and the measured time changes justify the change. It's also consistent with changes I proposed for the sorting.

There's definitely more work to be done to get to the end state, but this seems like a good step.

This reduces the tile memory footprint and drops the requirement for tile x-coordinates to be signed. We may be able to drop the `TileIndex` indirection soon. Coarse winding calculation is performed in tile generation, like it more-or-less was before. In principle it requires just a single bit in the tile packing. Calculating the coarse winding could be performed in the strip generation as well, but I suspect it to be slightly more performant this way. I haven't accurately measured that. This is along the same lines as what [Raph mentioned](https://xi.zulipchat.com/#narrow/channel/197075-gpu/topic/CPU.20sparse.20strip.20rendering.20to.20pixels/near/500019583). Quoting: > pack x, y, line id, and winding number delta into a single u32 and > sort that. This is a very tight packing with u32 and may overflow in > some cases (very large number of lines per path), but I think should be > very efficient. Then the determination of p0 and p1 within the tile > happens after sorting. A major motivation for this change is that it can > be done on GPU in hybrid modes, but the approach to parallelism also > works with SIMD, I think. Tiles above, to the right and below the viewport are now culled. Tiles to the left of the viewport are clamped but not culled yet. Currently strip generation has some code special-casing those tiles. On my machine, paris-30k measures before and after the changes as the following. (With 20 iterations, probably accurate to about ~0.2ms, but we should think about better benchmarking.) ``` Tile generation old: 24.394854ms new: 17.620834ms Tile sorting old: 8.922622ms new: 8.146455ms Strip generation old: 33.187645ms new: 31.825052ms ```

tomcur · 2025-03-11T11:01:48Z

To reduce some effort, tile x-coord data type clean up and packing the tile more tightly are best done together in a separate PR, I think.

#845 removed line-tile intersections from the debug rendering, as these are no longer present in the internal representation. However, that also removed a way to see whether lines are directed upwards or downwards, which is important to understand their effect on winding. This proposes coloring the lines according to their orientation: lines oriented upwards add to winding and are colored green. Lines oriented downwards subtract from winding and are colored red. Horizontal lines don't affect winding, and will with future changes likely not generate any tiles at all. These are colored grey.

#845 removed line-tile intersections from the debug rendering, as these are no longer present in the internal representation. However, that also removed a way to see whether lines are directed upwards or downwards, which is important to understand their effect on winding. This proposes coloring the lines according to their orientation: lines oriented upwards add to winding and are colored green. Lines oriented downwards subtract from winding and are colored red. Horizontal lines don't affect winding, and will with future changes likely not generate any tiles at all. These are colored grey. An example of what that looks like: ```shell cargo run --bin debug -- --path "M 5 5 L 40 23 L 20 43.5 L 1 43.5 Z" --stages ls,ta ``` ![debug](https://github.com/user-attachments/assets/ee4600a8-eb03-4cef-9796-a86cafec9526)

dominikh · 2025-03-12T00:30:08Z

sparse_strips/vello_common/src/strip.rs

+                let h = (ymax - ymin).max(0.);
+                accumulated_winding[y_idx as usize] += sign * h;
+
+                for x_idx in 0..Tile::HEIGHT {


@tomcur should this have said Tile::WIDTH?

Yes, thank you! Opened #848.

@dominikh

Missed in #845, spotted by @dominikh ([comment](#845 (comment))). Co-authored-by: Dominik Honnef <dominik@honnef.co>

tomcur force-pushed the unpack-lines branch from 6f9331a to 2a695b3 Compare March 10, 2025 14:54

tomcur commented Mar 10, 2025

View reviewed changes

tomcur requested review from raphlinus and LaurenzV March 10, 2025 15:05

tomcur changed the title ~~Remove line packing from tiles, slightly improve culling~~ Remove line packing from tiles, slightly improve tile culling Mar 10, 2025

tomcur force-pushed the unpack-lines branch from 2a695b3 to 47a3c7d Compare March 10, 2025 15:47

LaurenzV reviewed Mar 10, 2025

View reviewed changes

tomcur force-pushed the unpack-lines branch from 47a3c7d to 33a61c1 Compare March 10, 2025 16:02

tomcur commented Mar 10, 2025

View reviewed changes

raphlinus approved these changes Mar 10, 2025

View reviewed changes

tomcur and others added 2 commits March 11, 2025 11:02

Fix vello_toy

6dc2069

tomcur force-pushed the unpack-lines branch from 33a61c1 to 6dc2069 Compare March 11, 2025 10:15

vello_toy: remove tile intersections

fad03df

tomcur added this pull request to the merge queue Mar 11, 2025

Merged via the queue into main with commit 4a75ece Mar 11, 2025
17 checks passed

tomcur deleted the unpack-lines branch March 11, 2025 11:04

tomcur mentioned this pull request Mar 11, 2025

vello_toy: Color lines according to their winding sign #846

Merged

dominikh reviewed Mar 12, 2025

View reviewed changes

tomcur mentioned this pull request Mar 12, 2025

Fix x-dimension range in winding loop #848

Merged

github-merge-queue bot pushed a commit that referenced this pull request Mar 12, 2025

Fix x-dimension range in winding loop (#848)

b05dea1

Missed in #845, spotted by @dominikh ([comment](#845 (comment))). Co-authored-by: Dominik Honnef <dominik@honnef.co>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove line packing from tiles, slightly improve tile culling #845

Remove line packing from tiles, slightly improve tile culling #845

tomcur commented Mar 10, 2025 •

edited

Loading

tomcur Mar 10, 2025

tomcur Mar 10, 2025

LaurenzV Mar 10, 2025

tomcur Mar 10, 2025

raphlinus Mar 10, 2025

tomcur Mar 10, 2025

LaurenzV Mar 10, 2025 •

edited

Loading

tomcur Mar 10, 2025

raphlinus Mar 10, 2025

LaurenzV left a comment

LaurenzV Mar 10, 2025 •

edited

Loading

LaurenzV Mar 10, 2025

LaurenzV Mar 10, 2025

tomcur Mar 10, 2025

LaurenzV Mar 10, 2025

LaurenzV Mar 10, 2025

tomcur Mar 10, 2025

LaurenzV Mar 10, 2025

tomcur Mar 10, 2025

LaurenzV Mar 10, 2025

tomcur Mar 10, 2025

tomcur commented Mar 10, 2025 •

edited

Loading

tomcur Mar 10, 2025

LaurenzV Mar 10, 2025

tomcur Mar 11, 2025

raphlinus left a comment

tomcur commented Mar 11, 2025 •

edited

Loading

dominikh Mar 12, 2025

tomcur Mar 12, 2025

Remove line packing from tiles, slightly improve tile culling #845

Remove line packing from tiles, slightly improve tile culling #845

Conversation

tomcur commented Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LaurenzV Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LaurenzV left a comment

Choose a reason for hiding this comment

LaurenzV Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomcur commented Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raphlinus left a comment

Choose a reason for hiding this comment

tomcur commented Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomcur commented Mar 10, 2025 •

edited

Loading

LaurenzV Mar 10, 2025 •

edited

Loading

LaurenzV Mar 10, 2025 •

edited

Loading

tomcur commented Mar 10, 2025 •

edited

Loading

tomcur commented Mar 11, 2025 •

edited

Loading