Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Steaming; PGO #19

Open
JoeUX opened this issue Oct 24, 2016 · 10 comments
Open

Steaming; PGO #19

JoeUX opened this issue Oct 24, 2016 · 10 comments

Comments

@JoeUX
Copy link

JoeUX commented Oct 24, 2016

Hi Eric – libdeflate is hitting incredible performance numbers now. Are you still opposed to a streaming implementation? It would come in very handy, especially since the other zlib forks (Cloudflare's, zlib-ng, Intel's) don't seem to be usable on typical servers like nginx.

Also, have you tried profile guided optimization in gcc? If not, do you anticipate any wins there? libdeflate is already hitting incredible numbers, so maybe there isn't much headroom left, but I've been reading up on PGO in gcc 5/6 and was curious. I'll probably try it at some point. I've also discovered the source code annotations that gcc supports, but I doubt they would make much difference here. (The final frontier might be STOKE: https://github.com/StanfordPL/stoke)

@ebiggers
Copy link
Owner

I'd like to support streaming, but I'm not sure what the best way to do so is, other than the traditional zlib way. It needs more investigation.

I haven't had much luck with profile-guided optimizations in the past. I just did a basic test of them with libdeflate, but they didn't make much of a difference. I already use branch prediction hints a lot, so it's possible that PGO is redundant where it would matter most.

STOKE looks very interesting!

@nemequ
Copy link

nemequ commented Nov 4, 2016

I'd like to support streaming, but I'm not sure what the best way to do so is, other than the traditional zlib way. It needs more investigation.

Are you talking about the API or the internals? Assuming the former, zlib-style would make it very easy to port code from zlib to libdeflate…

@ebiggers
Copy link
Owner

ebiggers commented Nov 5, 2016

Internals, mostly. I am familiar with zlib's API.

@nemequ
Copy link

nemequ commented Nov 5, 2016

Internals, mostly. I am familiar with zlib's API.

I thought you might have been considering an alternative to zlib's streaming API. For example, something more like zstd or density where you pass the input and output buffers as arguments instead of communicating through the stream struct, or callbacks for readers and writers like libzpaq, or libslz has some intriguing ideas to avoid unnecessary buffering…

@ebiggers
Copy link
Owner

ebiggers commented Nov 5, 2016

Yes, other options for the API should be considered too.

@ghost
Copy link

ghost commented Apr 28, 2020

It would be advantage in some scenarios to be able to decompress in smaller blocks; I use libdeflate in a PNG decoder and it would be more efficient to progressively decompress as the data becomes available (working set would fit into L1 very often). It's common for PNG file to store the compressed data in multiple IDAT chunks, typically 8192 byte blocks (other sizes are also used but this is easily the most common size). Now I have to do "multiple passes" over the data which keeps the cache colder + increases temporary memory usage. I am the kind of dude who does care about these things as the code is often deployed on 32 bit low-power devices, for desktop I don't care so much. :)

@ghost
Copy link

ghost commented Apr 28, 2020

On the bright side, even with colder caches the libdeflate defeats the miniz+zlib overhead so much that it's still a net win regardless. :)

@andrews05
Copy link

Hi, has there been any further consideration to this? I.e. is it on your roadmap?

Btw, thanks heaps for making this fantastic library. Great to see the recent improvements for Apple Silicon.
Aside: Have you seen Efficient Compression Tool? It has a performance curve similar to libdeflate, but for levels beyond 12.

@ebiggers
Copy link
Owner

I don't plan to add streaming support. I think there's no easy way to do it, and it doesn't make too much sense to add it without going all the way and providing zlib API compatibility. I don't have time for that, though. This project is for "fun", so I focus on what I'm most interested in, which are the actual algorithms.

I'm aware that zopfli (and ECT which uses a modified version of zopfli) can produce a slightly better compression ratio than libdeflate level 12 on many inputs, mainly because zopfli spends a lot more time doing block splitting. I haven't tried to match that exactly yet; instead the focus of libdeflate level 12 is near-optimal with much better performance. libdeflate v1.9 included improvements to block splitting that don't affect performance very much.

@andrews05
Copy link

Good answer! Perhaps you should close this issue?

The thing I find interesting about ECT is it manages to achieve zopfli level compression but is actually incredibly fast. Like, it would fit perfectly on the end of libdeflate's performance curve as a hypothetical level 13 :)

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants