Require BufRead + Seek bound for decoding #558

fintelia · 2024-12-25T23:30:52Z

The addition of a BufRead bound was initially proposed in #427. At the time, the main image crate did not require that bound and the performance implications were uncertain. We've now done a major release of the image crate with suitable API updates, and some early experiments have shown that refactoring taking advantage of BufRead can yield modest performance improvement. At the moment, the only actual change related to BufRead is that we directly use the provided reader rather than wrapping in a BufReader.

Seek isn't used yet, but it will enable possible future improvements like rewinding the decoder when an animation finishes or delaying the reading of large EXIF / ICC / text chunks unless or until the user queries them.

Overall the performance difference on real-world images seems to be in the noise, but there's a ~10% improvement on the synthetic noncompressed benchmarks.

The main user-facing impact from this change is some changes when constructing a decoder...

If you pass a File object, wrap it in a BufReader:
```
Decoder::new(BufReader::new(File::open(path)?)
```
If you have a byte slice, wrap it in a Cursor:
```
Decoder::new(Cursor::new(&bytes))
```

kornelski · 2024-12-26T03:17:34Z

The BufRead bound makes sense. Sadly, libstd doesn't have specialization magic to avoid double buffering of slices and vecs, so it's better to take BufRead than to redundantly alloc BufReader. I do worry that people will forget to buffer File, so I'd also add path-based convenience methods to mitigate the risk.

But the blanket Seek bound seems inelegant to me. It requires a Cursor where raw slices used to work. It's not as widely compatible as Read or BufRead, since not everything is seekable (tarball entry, network stream). This seems like a significant API downgrade, and it's strange to do it so casually and speculatively, when there isn't even anything showing it's necessary and worth the downside.

The PNG format has been carefully designed to not require seeking, so Seek on its codec rubs me the wrong way.

Could the Seek bound be limited to only methods that actually require it? e.g.

fn rewind_animation(&mut self) where R: Seek

The decoder could support skipping of chunks better, but that doesn't require Seek. The unwanted chunk data could be read and discarded. I bet that the difference between read-and-discard and seeking forward over a portion of a file is negligible at the scale of bytes to kilobytes that chunks have in non-pathological cases.

fintelia · 2024-12-31T02:37:23Z

The problem with skipping chunks is knowing what to skip. If the user later asks for a chunk and you skipped it earlier, it is already too late. We've already picked up a couple ugly set_ignore_xxx_chunk methods that the user can call before decoding, but I don't think that's really the best model. If nothing else, it doesn't work well with the ImageDecoder interface we have in the main image crate.

At the moment, the use-case for Seek that I'm most focused on is moving away from the byte-at-time state machine we have now while still being able to resume after an EOF.

kornelski · 2024-12-31T13:42:40Z

I think it'd be reasonable to have user specify what chunks they're interested in before parsing.

Applications either just want pixels, and don't care about the metadata, or can list chunks they have support for, or just want all chunks to roundtrip the file.

This doesn't have to be a bunch of bespoke methods and booleans. It can be a list of chunk names and all/none special case.

fintelia added 2 commits December 25, 2024 14:35

Require BufRead + Seek bound for decoding

4726be6

Fix decode and roundtrip fuzz targets

8064e22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Require BufRead + Seek bound for decoding #558

Require BufRead + Seek bound for decoding #558

fintelia commented Dec 25, 2024

kornelski commented Dec 26, 2024 •

edited

Loading

fintelia commented Dec 31, 2024

kornelski commented Dec 31, 2024

Require BufRead + Seek bound for decoding #558

Are you sure you want to change the base?

Require BufRead + Seek bound for decoding #558

Conversation

fintelia commented Dec 25, 2024

kornelski commented Dec 26, 2024 • edited Loading

fintelia commented Dec 31, 2024

kornelski commented Dec 31, 2024

kornelski commented Dec 26, 2024 •

edited

Loading