-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Proactively skip huffman compression based on sampling where non-comp… #2717
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Just have a couple style nits, otherwise it looks great!
const void* src, size_t srcSize, | ||
unsigned maxSymbolValue, unsigned tableLog, | ||
void* workSpace, size_t wkspSize, /**< `workSpace` must be aligned on 4-bytes boundaries, `wkspSize` must be >= HUF_WORKSPACE_SIZE */ | ||
HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat, int bmi2, unsigned suspectUncompressible); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor : api design :
as we get more and more parameters in the function signature,
it becomes more and more difficult to understand what this 1
or this 0
means.
For clarity, we end up commenting them, so that the reader can track what each value means.
... , 1 /* suspected uncompressible */);
.
Another possibility is to use enum
.
..., HUF_suspectedUncompressible);
In such a context, it's essentially a compiler-checked comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some thought, added comments rather than an enum; this is really just a stand-in for a boolean value so I think the argument name is sufficient to convey the meaning as far as the function definitions are concerned..
…ressibility is suspected
When a large block is suspected to be incompressible (based on ratio of literals to sequences), we evaluate whether or not huffman should be applied based on a smaller sampling of 4k blocks at the start and end of the buffer to proactively skip having to construct a histogram for the full data range. Benchmarks show improvements on low-compressibility samples:
Benchmarked on macos without PR:
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P0 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 1628.7 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P1 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 1596.6 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P5 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 850.4 MB/s (79994983)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P10 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 663.1 MB/s (74066054)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P50 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 564.4 MB/s (31792743)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P100 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 7765.7 MB/s ( 3570)
With PR:
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P0 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 3322.8 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P1 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 3373.9 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P1 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 3323.8 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P5 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 853.9 MB/s (79994983)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P10 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 676.9 MB/s (74066054)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P50 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 540.4 MB/s (31792743)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P100 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 7838.4 MB/s ( 3570)