Skip to content

huff0: asm implementation of Decompress1X #596

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 2 commits into from
May 23, 2022

Conversation

WojciechMula
Copy link
Contributor

@WojciechMula WojciechMula commented May 13, 2022

Solves #595

Benchmarks go test -run XYZ -run 1X on an Ice Lake machine.

benchmark                                                   old ns/op     new ns/op     delta
BenchmarkCompress1XReuseNone/digits-16                      184588        186723        +1.16%
BenchmarkCompress1XReuseNone/gettysburg-16                  5227          5294          +1.28%
BenchmarkCompress1XReuseNone/twain-16                       654643        631837        -3.48%
BenchmarkCompress1XReuseNone/low-ent.10k-16                 84314         84991         +0.80%
BenchmarkCompress1XReuseNone/superlow-ent-10k-16            34252         34385         +0.39%
BenchmarkCompress1XReuseNone/crash2-16                      1213          1252          +3.22%
BenchmarkCompress1XReuseNone/endzerobits-16                 268           281           +4.89%
BenchmarkCompress1XReuseNone/endnonzero-16                  844           876           +3.83%
BenchmarkCompress1XReuseNone/case1-16                       3904          3946          +1.08%
BenchmarkCompress1XReuseNone/case2-16                       3901          3942          +1.05%
BenchmarkCompress1XReuseNone/case3-16                       3923          3963          +1.02%
BenchmarkCompress1XReuseNone/pngdata.001-16                 167341        167843        +0.30%
BenchmarkCompress1XReuseNone/normcount2-16                  2257          2315          +2.57%
BenchmarkCompress1XReuseAllow/digits-16                     184447        185666        +0.66%
BenchmarkCompress1XReuseAllow/gettysburg-16                 4699          4806          +2.28%
BenchmarkCompress1XReuseAllow/twain-16                      646781        635147        -1.80%
BenchmarkCompress1XReuseAllow/low-ent.10k-16                83972         84630         +0.78%
BenchmarkCompress1XReuseAllow/superlow-ent-10k-16           33855         34139         +0.84%
BenchmarkCompress1XReuseAllow/crash2-16                     889           885           -0.46%
BenchmarkCompress1XReuseAllow/endzerobits-16                260           262           +0.92%
BenchmarkCompress1XReuseAllow/endnonzero-16                 611           618           +1.11%
BenchmarkCompress1XReuseAllow/case1-16                      3205          3171          -1.06%
BenchmarkCompress1XReuseAllow/case2-16                      3164          3161          -0.09%
BenchmarkCompress1XReuseAllow/case3-16                      3201          3168          -1.03%
BenchmarkCompress1XReuseAllow/pngdata.001-16                166807        166828        +0.01%
BenchmarkCompress1XReuseAllow/normcount2-16                 1779          1844          +3.65%
BenchmarkCompress1XReusePrefer/digits-16                    183785        185473        +0.92%
BenchmarkCompress1XReusePrefer/gettysburg-16                3018          3009          -0.30%
BenchmarkCompress1XReusePrefer/twain-16                     637243        631305        -0.93%
BenchmarkCompress1XReusePrefer/low-ent.10k-16               83624         84309         +0.82%
BenchmarkCompress1XReusePrefer/superlow-ent-10k-16          33316         33357         +0.12%
BenchmarkCompress1XReusePrefer/crash2-16                    199           200           +0.45%
BenchmarkCompress1XReusePrefer/endzerobits-16               183           188           +2.34%
BenchmarkCompress1XReusePrefer/endnonzero-16                192           194           +0.99%
BenchmarkCompress1XReusePrefer/case1-16                     299           298           -0.30%
BenchmarkCompress1XReusePrefer/case2-16                     249           252           +1.08%
BenchmarkCompress1XReusePrefer/case3-16                     252           254           +0.63%
BenchmarkCompress1XReusePrefer/pngdata.001-16               162023        161971        -0.03%
BenchmarkCompress1XReusePrefer/normcount2-16                326           326           -0.03%
BenchmarkCompress1XSizes/digits-100-16                      1420          1458          +2.68%
BenchmarkCompress1XSizes/digits-200-16                      1605          1651          +2.87%
BenchmarkCompress1XSizes/digits-500-16                      2145          2178          +1.54%
BenchmarkCompress1XSizes/digits-1000-16                     2997          3067          +2.34%
BenchmarkCompress1XSizes/digits-5000-16                     9778          9836          +0.59%
BenchmarkCompress1XSizes/digits-10000-16                    18312         18499         +1.02%
BenchmarkCompress1XSizes/digits-50000-16                    88397         89414         +1.15%
BenchmarkDecompress1XTable/digits-16                        392200        303088        -22.72%
BenchmarkDecompress1XTable/gettysburg-16                    7671          5701          -25.68%
BenchmarkDecompress1XTable/twain-16                         1250201       851679        -31.88%
BenchmarkDecompress1XTable/low-ent.10k-16                   139365        110457        -20.74%
BenchmarkDecompress1XTable/superlow-ent-10k-16              37111         29501         -20.51%
BenchmarkDecompress1XTable/crash2-16                        670           702           +4.78%
BenchmarkDecompress1XTable/endzerobits-16                   76.7          68.8          -10.31%
BenchmarkDecompress1XTable/endnonzero-16                    468           501           +7.07%
BenchmarkDecompress1XTable/case1-16                         1989          1945          -2.21%
BenchmarkDecompress1XTable/case2-16                         1936          1919          -0.88%
BenchmarkDecompress1XTable/case3-16                         1957          1948          -0.46%
BenchmarkDecompress1XTable/pngdata.001-16                   206514        144385        -30.08%
BenchmarkDecompress1XTable/normcount2-16                    1409          1352          -4.05%
BenchmarkDecompress1XNoTable/digits/100-16                  423           330           -22.09%
BenchmarkDecompress1XNoTable/digits/10000-16                38077         28327         -25.61%
BenchmarkDecompress1XNoTable/digits/262143-16               1043522       802526        -23.09%
BenchmarkDecompress1XNoTable/gettysburg/100-16              416           334           -19.74%
BenchmarkDecompress1XNoTable/gettysburg/10000-16            41724         28560         -31.55%
BenchmarkDecompress1XNoTable/gettysburg/262143-16           1141714       759146        -33.51%
BenchmarkDecompress1XNoTable/twain/100-16                   424           342           -19.40%
BenchmarkDecompress1XNoTable/twain/10000-16                 41842         28652         -31.52%
BenchmarkDecompress1XNoTable/twain/262143-16                1244988       850157        -31.71%
BenchmarkDecompress1XNoTable/low-ent.10k/100-16             441           446           +0.97%
BenchmarkDecompress1XNoTable/low-ent.10k/10000-16           35085         27606         -21.32%
BenchmarkDecompress1XNoTable/low-ent.10k/262143-16          914657        719273        -21.36%
BenchmarkDecompress1XNoTable/superlow-ent-10k/262143-16     920422        718830        -21.90%
BenchmarkDecompress1XNoTable/crash2/100-16                  408           332           -18.49%
BenchmarkDecompress1XNoTable/crash2/10000-16                37059         28189         -23.93%
BenchmarkDecompress1XNoTable/crash2/262143-16               971784        737302        -24.13%
BenchmarkDecompress1XNoTable/endzerobits/100-16             446           448           +0.58%
BenchmarkDecompress1XNoTable/endzerobits/10000-16           35144         27607         -21.45%
BenchmarkDecompress1XNoTable/endzerobits/262143-16          914147        719542        -21.29%
BenchmarkDecompress1XNoTable/endnonzero/100-16              446           448           +0.45%
BenchmarkDecompress1XNoTable/endnonzero/10000-16            35223         27629         -21.56%
BenchmarkDecompress1XNoTable/endnonzero/262143-16           918031        720103        -21.56%
BenchmarkDecompress1XNoTable/case1/100-16                   407           331           -18.71%
BenchmarkDecompress1XNoTable/case1/10000-16                 37955         28301         -25.44%
BenchmarkDecompress1XNoTable/case1/262143-16                991910        739995        -25.40%
BenchmarkDecompress1XNoTable/case2/100-16                   408           338           -17.29%
BenchmarkDecompress1XNoTable/case2/10000-16                 37403         28024         -25.08%
BenchmarkDecompress1XNoTable/case2/262143-16                972229        732974        -24.61%
BenchmarkDecompress1XNoTable/case3/100-16                   418           344           -17.79%
BenchmarkDecompress1XNoTable/case3/10000-16                 37588         28130         -25.16%
BenchmarkDecompress1XNoTable/case3/262143-16                977497        735540        -24.75%
BenchmarkDecompress1XNoTable/pngdata.001/100-16             430           379           -11.92%
BenchmarkDecompress1XNoTable/pngdata.001/10000-16           39719         27614         -30.48%
BenchmarkDecompress1XNoTable/pngdata.001/262143-16          1053768       730571        -30.67%
BenchmarkDecompress1XNoTable/normcount2/100-16              416           330           -20.57%
BenchmarkDecompress1XNoTable/normcount2/10000-16            38625         28498         -26.22%
BenchmarkDecompress1XNoTable/normcount2/262143-16           1008971       745795        -26.08%

@klauspost
Copy link
Owner

Good. I will be mostly afk for about a week.

@WojciechMula WojciechMula force-pushed the huff0-decompress1x-asm branch from 5f366e2 to 2965e11 Compare May 19, 2022 06:10
@WojciechMula WojciechMula force-pushed the huff0-decompress1x-asm branch from 2965e11 to b58e2de Compare May 19, 2022 06:20
We would require just to replace a variable shifts with shifts by immediate values.
@WojciechMula WojciechMula marked this pull request as ready for review May 19, 2022 07:08
@klauspost klauspost merged commit e77bf31 into klauspost:master May 23, 2022
@klauspost klauspost deleted the huff0-decompress1x-asm branch May 23, 2022 11:37
@klauspost
Copy link
Owner

klauspost commented May 23, 2022

Nice!

BMI (compared to pure amd64 asm) doesn't show any speedup on my system, but also no worse.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants