-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Remove line packing from tiles, slightly improve tile culling #845
Conversation
let line = lines[tile.line_idx as usize]; | ||
let tile_left_x = tile.x as f32 * Tile::WIDTH as f32; | ||
let tile_top_y = tile.y as f32 * Tile::HEIGHT as f32; | ||
let p0_x = line.p0.x - tile_left_x; | ||
let p0_y = line.p0.y - tile_top_y; | ||
let p1_x = line.p1.x - tile_left_x; | ||
let p1_y = line.p1.y - tile_top_y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these calculations placed at the front of the loop, strip generation measures at ~1-2% faster for paris-30k on my machine.
/// Whether the line crosses the top edge of the tile. | ||
/// | ||
/// Lines making this crossing increment or decrement the coarse tile winding, depending on the | ||
/// line direction. | ||
pub winding: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can become a single bit in a tight packing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, we had a delta which can be -1, 0, 1. Am I correct in the assumption that winding == true
corresponds to either -1/1, the exact delta being determined after the sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, false
corresponds to 0
, and true
corresponds to -1
if the line is oriented down and 1
if the line is oriented up. In this PR the sign is resolved during strip generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, that's a neat trick, I love it. I was thinking along similar lines for the MSAA winding number accumulation in the sparse strip design (see doc).
|
||
/// A tile represents an aligned area on the pixmap, used to subdivide the viewport into sub-areas | ||
/// (currently 4x4) and analyze line intersections inside each such area. | ||
/// | ||
/// Keep in mind that it is possible to have multiple tiles with the same index, | ||
/// namely if we have multiple lines crossing the same 4x4 area! | ||
#[derive(Debug, Clone)] | ||
#[derive(Debug, Clone, Copy)] | ||
pub struct Tile { | ||
/// The index of the tile in the x direction. | ||
pub x: i32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should become u16
, probably as part of this PR, which I'd prefer doing as a separate commit (many non-interesting changes will follow from that). With this PR currently being based on top of another PR, I'll hold off on that for a bit to keep things clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The end goal is that a Tile
can be represented by just u32, right? This would mean that we need to first check whether any of the components can overflow, and if so use a "long code path" that doesn't contain this tight packing, right? In practice this would hopefully never be necessary unless there are a lot of lines in a single path or x/y are very big, but for correctness it would still be necessary to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is best answered by @raphlinus: I had the same question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you'll overflow the u32 packing if the line count exceeds the space available, which I think will be rare, but there would need to be a fallback as it's not impossible.
I guess a major empirical question is whether sorting of u32's is actually much faster than u64's. If not, then it's probably better to avoid the complexity of having two code paths. This question is potentially going to depend on a lot of things, like the architecture it's running on and whether we try to do alternative sorting strategies like radix sort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proper review will wait until the previous PR has been merged for easiniess, but just a couple of (perhaps dumb) questions from my side. Otherwise, the overall direction seems sensible to me!
|
||
/// A tile represents an aligned area on the pixmap, used to subdivide the viewport into sub-areas | ||
/// (currently 4x4) and analyze line intersections inside each such area. | ||
/// | ||
/// Keep in mind that it is possible to have multiple tiles with the same index, | ||
/// namely if we have multiple lines crossing the same 4x4 area! | ||
#[derive(Debug, Clone)] | ||
#[derive(Debug, Clone, Copy)] | ||
pub struct Tile { | ||
/// The index of the tile in the x direction. | ||
pub x: i32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The end goal is that a Tile
can be represented by just u32, right? This would mean that we need to first check whether any of the components can overflow, and if so use a "long code path" that doesn't contain this tight packing, right? In practice this would hopefully never be necessary unless there are a lot of lines in a single path or x/y are very big, but for correctness it would still be necessary to have.
/// Whether the line crosses the top edge of the tile. | ||
/// | ||
/// Lines making this crossing increment or decrement the coarse tile winding, depending on the | ||
/// line direction. | ||
pub winding: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, we had a delta which can be -1, 0, 1. Am I correct in the assumption that winding == true
corresponds to either -1/1, the exact delta being determined after the sorting?
let nudge_point = |p: Point| -> Point { | ||
// Lines that cross vertical tile boundaries need special treatment during | ||
// anti-aliasing. This case is detected via tile-relative x == 0. However, | ||
// lines can naturally start or end at a multiple of the 4x4 grid, too, but | ||
// these don't constitute crossings. We nudge these points ever so slightly, | ||
// by ensuring that xfrac0 and xfrac1 are always at least 1/8192 of a pixel. | ||
// By doing so, whenever we encounter a point | ||
// at a tile relative 0, we can treat it as an edge crossing. This is somewhat | ||
// of a hack and in theory we should rather solve the underlying issue in the | ||
// strip generation code, but it works for now. | ||
if p.x.fract() == 0.0 { | ||
Point { | ||
x: p.x + SCALED_X_NUDGE_FACTOR, | ||
y: p.y, | ||
} | ||
} else { | ||
p | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this is not necessary anymore due to your new strip generation algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I've attempted to encode as many of the edge-cases into IEEE 754 float semantics as I could (mostly around inf
and nan
), to have as few branches and instructions as possible. It could perhaps be improved more, but I haven't found that yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
// TODO: Tiles are clamped to the left edge of the viewport, but lines fully to the left of the | ||
// viewport are not culled yet. These lines impact winding, and would need forwarding of | ||
// winding to the strip generation stage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above you mentioned x can be stored as u16, so how do you distinguish between lines at 0 and lines strictly to the left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds logic to strip generation to check for left-of-viewport lines. With some plumbing of winding/coverage from earlier pipeline stages into strip generation, those checks are best moved back to tile generation in a follow-up PR. I believe Raph would like to cull some geometry even earlier than tile generation in future changes, which might be able to use some of that same plumbing.
(p1_y, p1_x, p0_y, p0_x) | ||
}; | ||
|
||
// For ease of logic, special-case purely vertical tiles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are purely horizontal lines not worth special-casing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are worth special-casing. Horizontal geometry is parallel to the winding scan direction, and so doesn't impact winding. We should evaluate removing them entirely. That does most likely entail adding a sparse alpha mask draw command to the wide tiles.
|
||
last_packed = Point::new(packed.x + flip, packed.y); | ||
} else { | ||
let x_slope = (p1_x - p0_x) / (p1_y - p0_y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this not be NaN for horizontal lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but it can be inf
. The min
and max
below take care of that. Admittedly, I haven't verified that too closely since horizontal lines don't impact geometry and can be elided, which could be done shortly after this lands in a follow-up PR.
Spot-checking some pixels, the changed tests all appear to just be the rounding of pixel coverage falling differently. I haven't checked too closely, but it's possible the area coverage calculation is by coincidence slightly more symmetrical now, check e.g. the left and right edges of |
sparse_strips/vello_toy/src/debug.rs
Outdated
let line = line_buf[tile.line_idx as usize]; | ||
|
||
// TODO: how to handle line intersections now lines are not explicitly segmented by tile | ||
// generation anymore? | ||
let p0 = Point { | ||
x: line.p0.x - x as f32, | ||
y: line.p0.y - y as f32, | ||
}; | ||
let p1 = Point { | ||
x: line.p1.x - x as f32, | ||
y: line.p1.y - y as f32, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit hacky at the moment to get tests to pass. Lines are no longer segmented per tile explicitly, so this is drawing the same line multiple tiles. @LaurenzV do you have some ideas about this? Segmenting here manually might be a bit misleading, as with the code in this PR that doesn't actually happen internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with removing the tile intersection points if the underlying algorithm is different now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the intersections, but am opening a follow-up soon after this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, and the measured time changes justify the change. It's also consistent with changes I proposed for the sorting.
There's definitely more work to be done to get to the end state, but this seems like a good step.
This reduces the tile memory footprint and drops the requirement for tile x-coordinates to be signed. We may be able to drop the `TileIndex` indirection soon. Coarse winding calculation is performed in tile generation, like it more-or-less was before. In principle it requires just a single bit in the tile packing. Calculating the coarse winding could be performed in the strip generation as well, but I suspect it to be slightly more performant this way. I haven't accurately measured that. This is along the same lines as what [Raph mentioned](https://xi.zulipchat.com/#narrow/channel/197075-gpu/topic/CPU.20sparse.20strip.20rendering.20to.20pixels/near/500019583). Quoting: > pack x, y, line id, and winding number delta into a single u32 and > sort that. This is a very tight packing with u32 and may overflow in > some cases (very large number of lines per path), but I think should be > very efficient. Then the determination of p0 and p1 within the tile > happens after sorting. A major motivation for this change is that it can > be done on GPU in hybrid modes, but the approach to parallelism also > works with SIMD, I think. Tiles above, to the right and below the viewport are now culled. Tiles to the left of the viewport are clamped but not culled yet. Currently strip generation has some code special-casing those tiles. On my machine, paris-30k measures before and after the changes as the following. (With 20 iterations, probably accurate to about ~0.2ms, but we should think about better benchmarking.) ``` Tile generation old: 24.394854ms new: 17.620834ms Tile sorting old: 8.922622ms new: 8.146455ms Strip generation old: 33.187645ms new: 31.825052ms ```
To reduce some effort, tile x-coord data type clean up and packing the tile more tightly are best done together in a separate PR, I think. |
#845 removed line-tile intersections from the debug rendering, as these are no longer present in the internal representation. However, that also removed a way to see whether lines are directed upwards or downwards, which is important to understand their effect on winding. This proposes coloring the lines according to their orientation: lines oriented upwards add to winding and are colored green. Lines oriented downwards subtract from winding and are colored red. Horizontal lines don't affect winding, and will with future changes likely not generate any tiles at all. These are colored grey.
#845 removed line-tile intersections from the debug rendering, as these are no longer present in the internal representation. However, that also removed a way to see whether lines are directed upwards or downwards, which is important to understand their effect on winding. This proposes coloring the lines according to their orientation: lines oriented upwards add to winding and are colored green. Lines oriented downwards subtract from winding and are colored red. Horizontal lines don't affect winding, and will with future changes likely not generate any tiles at all. These are colored grey. An example of what that looks like: ```shell cargo run --bin debug -- --path "M 5 5 L 40 23 L 20 43.5 L 1 43.5 Z" --stages ls,ta ``` 
let h = (ymax - ymin).max(0.); | ||
accumulated_winding[y_idx as usize] += sign * h; | ||
|
||
for x_idx in 0..Tile::HEIGHT { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomcur should this have said Tile::WIDTH?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you! Opened #848.
Missed in #845, spotted by @dominikh ([comment](#845 (comment))). Co-authored-by: Dominik Honnef <dominik@honnef.co>
This reduces the tile memory footprint and drops the requirement for tile x-coordinates to be signed. We may be able to drop the
TileIndex
indirection soon.Coarse winding calculation is performed in tile generation, like it more-or-less was before. In principle it requires just a single bit in the tile packing. Calculating the coarse winding could be performed in the strip generation as well, but I suspect it to be slightly more performant this way. I haven't accurately measured that.
This is one step towards what Raph mentioned. Quoting:
Tiles above, to the right and below the viewport are now culled. Tiles to the left of the viewport are clamped but not culled yet. Currently strip generation has some code special-casing those tiles.
On my machine, paris-30k measures with #835 as the "before" and after the changes here as the following. (With 20 iterations, appears to be accurate to about ~0.2ms, but we should think about better benchmarking.)