Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Require BufRead + Seek bound for decoding #558

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fintelia
Copy link
Contributor

The addition of a BufRead bound was initially proposed in #427. At the time, the main image crate did not require that bound and the performance implications were uncertain. We've now done a major release of the image crate with suitable API updates, and some early experiments have shown that refactoring taking advantage of BufRead can yield modest performance improvement. At the moment, the only actual change related to BufRead is that we directly use the provided reader rather than wrapping in a BufReader.

Seek isn't used yet, but it will enable possible future improvements like rewinding the decoder when an animation finishes or delaying the reading of large EXIF / ICC / text chunks unless or until the user queries them.

Overall the performance difference on real-world images seems to be in the noise, but there's a ~10% improvement on the synthetic noncompressed benchmarks.


The main user-facing impact from this change is some changes when constructing a decoder...

  • If you pass a File object, wrap it in a BufReader:

    Decoder::new(BufReader::new(File::open(path)?)
  • If you have a byte slice, wrap it in a Cursor:

    Decoder::new(Cursor::new(&bytes))

@kornelski
Copy link
Contributor

kornelski commented Dec 26, 2024

The BufRead bound makes sense. Sadly, libstd doesn't have specialization magic to avoid double buffering of slices and vecs, so it's better to take BufRead than to redundantly alloc BufReader. I do worry that people will forget to buffer File, so I'd also add path-based convenience methods to mitigate the risk.

But the blanket Seek bound seems inelegant to me. It requires a Cursor where raw slices used to work. It's not as widely compatible as Read or BufRead, since not everything is seekable (tarball entry, network stream). This seems like a significant API downgrade, and it's strange to do it so casually and speculatively, when there isn't even anything showing it's necessary and worth the downside.

The PNG format has been carefully designed to not require seeking, so Seek on its codec rubs me the wrong way.

Could the Seek bound be limited to only methods that actually require it? e.g.

fn rewind_animation(&mut self) where R: Seek

The decoder could support skipping of chunks better, but that doesn't require Seek. The unwanted chunk data could be read and discarded. I bet that the difference between read-and-discard and seeking forward over a portion of a file is negligible at the scale of bytes to kilobytes that chunks have in non-pathological cases.

@fintelia
Copy link
Contributor Author

The problem with skipping chunks is knowing what to skip. If the user later asks for a chunk and you skipped it earlier, it is already too late. We've already picked up a couple ugly set_ignore_xxx_chunk methods that the user can call before decoding, but I don't think that's really the best model. If nothing else, it doesn't work well with the ImageDecoder interface we have in the main image crate.

At the moment, the use-case for Seek that I'm most focused on is moving away from the byte-at-time state machine we have now while still being able to resume after an EOF.

@kornelski
Copy link
Contributor

I think it'd be reasonable to have user specify what chunks they're interested in before parsing.

Applications either just want pixels, and don't care about the metadata, or can list chunks they have support for, or just want all chunks to roundtrip the file.

This doesn't have to be a bunch of bespoke methods and booleans. It can be a list of chunk names and all/none special case.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants