Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Proactively skip huffman compression based on sampling where non-comp… #2717

Merged
merged 1 commit into from
Jul 1, 2021

Conversation

binhdvo
Copy link
Contributor

@binhdvo binhdvo commented Jun 28, 2021

When a large block is suspected to be incompressible (based on ratio of literals to sequences), we evaluate whether or not huffman should be applied based on a smaller sampling of 4k blocks at the start and end of the buffer to proactively skip having to construct a histogram for the full data range. Benchmarks show improvements on low-compressibility samples:

Benchmarked on macos without PR:

binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P0 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 1628.7 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P1 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 1596.6 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P5 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 850.4 MB/s (79994983)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P10 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 663.1 MB/s (74066054)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P50 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 564.4 MB/s (31792743)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P100 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 7765.7 MB/s ( 3570)

With PR:

binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P0 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 3322.8 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P1 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 3373.9 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P1 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 3323.8 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P5 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 853.9 MB/s (79994983)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P10 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 676.9 MB/s (74066054)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P50 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 540.4 MB/s (31792743)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P100 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 7838.4 MB/s ( 3570)

Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Just have a couple style nits, otherwise it looks great!

const void* src, size_t srcSize,
unsigned maxSymbolValue, unsigned tableLog,
void* workSpace, size_t wkspSize, /**< `workSpace` must be aligned on 4-bytes boundaries, `wkspSize` must be >= HUF_WORKSPACE_SIZE */
HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat, int bmi2, unsigned suspectUncompressible);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor : api design :

as we get more and more parameters in the function signature,
it becomes more and more difficult to understand what this 1 or this 0 means.
For clarity, we end up commenting them, so that the reader can track what each value means.
... , 1 /* suspected uncompressible */);.

Another possibility is to use enum.
..., HUF_suspectedUncompressible);
In such a context, it's essentially a compiler-checked comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thought, added comments rather than an enum; this is really just a stand-in for a boolean value so I think the argument name is sufficient to convey the meaning as far as the function definitions are concerned..

@binhdvo binhdvo merged commit b3e372c into facebook:dev Jul 1, 2021
@binhdvo binhdvo deleted the bootcamp branch August 24, 2021 02:40
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants