Skip to content

Speed up huff0 table decode #184

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 2 commits into from
Nov 22, 2019
Merged

Speed up huff0 table decode #184

merged 2 commits into from
Nov 22, 2019

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Nov 21, 2019

Big speedup on small blocks.

λ benchstat old.txt new2.txt
name                                   old time/op    new time/op    delta
Decompress4XTable/digits-12               243µs ± 0%     223µs ± 0%   -8.03%          (p=0.008 n=5+5)
Decompress4XTable/gettysburg-12          5.81µs ± 0%    4.84µs ± 0%  -16.70%          (p=0.008 n=5+5)
Decompress4XTable/twain-12                735µs ± 1%     687µs ± 1%   -6.53%          (p=0.008 n=5+5)
Decompress4XTable/low-ent.10k-12         73.1µs ± 1%    67.2µs ± 0%   -8.15%          (p=0.008 n=5+5)
Decompress4XTable/superlow-ent-10k-12    20.4µs ± 0%    18.7µs ± 1%   -8.04%          (p=0.008 n=5+5)
Decompress4XTable/case1-12               2.51µs ± 1%    2.15µs ± 1%  -14.44%          (p=0.008 n=5+5)
Decompress4XTable/case2-12               2.47µs ± 0%    2.09µs ± 1%  -15.12%          (p=0.008 n=5+5)
Decompress4XTable/case3-12               2.50µs ± 0%    2.11µs ± 0%  -15.53%          (p=0.008 n=5+5)
Decompress4XTable/pngdata.001-12         98.9µs ± 1%    91.8µs ± 0%   -7.13%          (p=0.008 n=5+5)
Decompress4XTable/normcount2-12          1.60µs ± 0%    1.49µs ± 0%   -6.84%          (p=0.008 n=5+5)

name                                   old speed      new speed      delta
Decompress4XTable/digits-12             412MB/s ± 0%   448MB/s ± 0%   +8.72%          (p=0.008 n=5+5)
Decompress4XTable/gettysburg-12         266MB/s ± 0%   320MB/s ± 0%  +20.04%          (p=0.008 n=5+5)
Decompress4XTable/twain-12              356MB/s ± 1%   381MB/s ± 1%   +6.99%          (p=0.008 n=5+5)
Decompress4XTable/low-ent.10k-12        547MB/s ± 1%   596MB/s ± 0%   +8.87%          (p=0.008 n=5+5)
Decompress4XTable/superlow-ent-10k-12   516MB/s ± 0%   561MB/s ± 1%   +8.74%          (p=0.008 n=5+5)
Decompress4XTable/case1-12             21.9MB/s ± 1%  25.6MB/s ± 1%  +16.89%          (p=0.008 n=5+5)
Decompress4XTable/case2-12             18.2MB/s ± 0%  21.5MB/s ± 1%  +17.81%          (p=0.008 n=5+5)
Decompress4XTable/case3-12             19.2MB/s ± 0%  22.7MB/s ± 0%  +18.39%          (p=0.008 n=5+5)
Decompress4XTable/pngdata.001-12        518MB/s ± 1%   558MB/s ± 0%   +7.68%          (p=0.008 n=5+5)
Decompress4XTable/normcount2-12        54.3MB/s ± 0%  58.3MB/s ± 0%   +7.32%          (p=0.008 n=5+5)

Big speedup on small blocks.

```
name                                   old time/op    new time/op    delta
Decompress4XTable/digits-12               243µs ± 0%     228µs ± 1%   -5.87%          (p=0.008 n=5+5)
Decompress4XTable/gettysburg-12          5.81µs ± 0%    5.42µs ± 0%   -6.79%          (p=0.008 n=5+5)
Decompress4XTable/twain-12                735µs ± 1%     696µs ± 1%   -5.35%          (p=0.008 n=5+5)
Decompress4XTable/low-ent.10k-12         73.1µs ± 1%    69.0µs ± 0%   -5.64%          (p=0.008 n=5+5)
Decompress4XTable/superlow-ent-10k-12    20.4µs ± 0%    19.2µs ± 1%   -5.49%          (p=0.008 n=5+5)
Decompress4XTable/case1-12               2.51µs ± 1%    2.16µs ± 1%  -14.01%          (p=0.008 n=5+5)
Decompress4XTable/case2-12               2.47µs ± 0%    2.13µs ± 0%  -13.74%          (p=0.016 n=5+4)
Decompress4XTable/case3-12               2.50µs ± 0%    2.15µs ± 0%  -14.13%          (p=0.008 n=5+5)
Decompress4XTable/pngdata.001-12         98.9µs ± 1%    94.9µs ± 2%   -4.08%          (p=0.008 n=5+5)
Decompress4XTable/normcount2-12          1.60µs ± 0%    1.51µs ± 0%   -5.55%          (p=0.008 n=5+5)

name                                   old speed      new speed      delta
Decompress4XTable/digits-12             412MB/s ± 0%   438MB/s ± 1%   +6.24%          (p=0.008 n=5+5)
Decompress4XTable/gettysburg-12         266MB/s ± 0%   286MB/s ± 0%   +7.28%          (p=0.008 n=5+5)
Decompress4XTable/twain-12              356MB/s ± 1%   377MB/s ± 1%   +5.65%          (p=0.008 n=5+5)
Decompress4XTable/low-ent.10k-12        547MB/s ± 1%   580MB/s ± 0%   +5.98%          (p=0.008 n=5+5)
Decompress4XTable/superlow-ent-10k-12   516MB/s ± 0%   546MB/s ± 1%   +5.81%          (p=0.008 n=5+5)
Decompress4XTable/case1-12             21.9MB/s ± 1%  25.5MB/s ± 1%  +16.31%          (p=0.008 n=5+5)
Decompress4XTable/case2-12             18.2MB/s ± 0%  21.1MB/s ± 0%  +15.94%          (p=0.016 n=5+4)
Decompress4XTable/case3-12             19.2MB/s ± 0%  22.3MB/s ± 0%  +16.48%          (p=0.008 n=5+5)
Decompress4XTable/pngdata.001-12        518MB/s ± 1%   540MB/s ± 2%   +4.26%          (p=0.008 n=5+5)
Decompress4XTable/normcount2-12        54.3MB/s ± 0%  57.5MB/s ± 0%   +5.85%          (p=0.008 n=5+5)
```
@klauspost klauspost merged commit 7892b3d into master Nov 22, 2019
@klauspost klauspost deleted the huff0-faster-table-decode branch November 22, 2019 09:50
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant