Skip to content

s2: Add AMD64 assembly for better mode #315

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 10 commits into from
Feb 25, 2021
Merged

s2: Add AMD64 assembly for better mode #315

merged 10 commits into from
Feb 25, 2021

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Feb 9, 2021

Blocks:

benchmark                              old ns/op     new ns/op     delta
BenchmarkTwainEncode1e1/better-32      10.7          10.5          -1.87%
BenchmarkTwainEncode1e2/better-32      2947          280           -90.50%
BenchmarkTwainEncode1e3/better-32      6664          2525          -62.11%
BenchmarkTwainEncode1e4/better-32      47401         25461         -46.29%
BenchmarkTwainEncode1e5/better-32      528060        417367        -20.96%
BenchmarkTwainEncode1e6/better-32      2137499       1554364       -27.28%

benchmark                                                  old ns/op     new ns/op     delta
BenchmarkRandomEncodeBetterBlock1MB-32                     39476         38241         -3.13%
BenchmarkEncodeS2Block/0-html/block-better-32              10140         6761          -33.32%
BenchmarkEncodeS2Block/1-urls/block-better-32              141170        90141         -36.15%
BenchmarkEncodeS2Block/2-jpg/block-better-32               1026          848           -17.35%
BenchmarkEncodeS2Block/3-jpg_200b/block-better-32          332           24.3          -92.68%
BenchmarkEncodeS2Block/4-pdf/block-better-32               12266         7164          -41.59%
BenchmarkEncodeS2Block/5-html4/block-better-32             14229         8134          -42.84%
BenchmarkEncodeS2Block/6-txt1/block-better-32              40537         27718         -31.62%
BenchmarkEncodeS2Block/7-txt2/block-better-32              35890         24783         -30.95%
BenchmarkEncodeS2Block/8-txt3/block-better-32              104525        77463         -25.89%
BenchmarkEncodeS2Block/9-txt4/block-better-32              144537        104121        -27.96%
BenchmarkEncodeS2Block/10-pb/block-better-32               9017          5427          -39.81%
BenchmarkEncodeS2Block/11-gaviota/block-better-32          31386         20973         -33.18%
BenchmarkEncodeS2Block/12-txt1_128b/block-better-32        312           16.4          -94.74%
BenchmarkEncodeS2Block/13-txt1_1000b/block-better-32       578           136           -76.47%
BenchmarkEncodeS2Block/14-txt1_10000b/block-better-32      3278          1293          -60.56%
BenchmarkEncodeS2Block/15-txt1_20000b/block-better-32      6469          3820          -40.95%

benchmark                                                  old MB/s      new MB/s      speedup
BenchmarkRandomEncodeBetterBlock1MB-32                     26562.09      27420.04      1.03x
BenchmarkEncodeS2Block/0-html/block-better-32              10098.47      15145.41      1.50x
BenchmarkEncodeS2Block/1-urls/block-better-32              4973.34       7788.75       1.57x
BenchmarkEncodeS2Block/2-jpg/block-better-32               119973.57     145200.76     1.21x
BenchmarkEncodeS2Block/3-jpg_200b/block-better-32          602.41        8241.97       13.68x
BenchmarkEncodeS2Block/4-pdf/block-better-32               8348.31       14293.26      1.71x
BenchmarkEncodeS2Block/5-html4/block-better-32             28786.61      50355.67      1.75x
BenchmarkEncodeS2Block/6-txt1/block-better-32              3751.82       5486.93       1.46x
BenchmarkEncodeS2Block/7-txt2/block-better-32              3487.81       5051.03       1.45x
BenchmarkEncodeS2Block/8-txt3/block-better-32              4082.81       5509.15       1.35x
BenchmarkEncodeS2Block/9-txt4/block-better-32              3333.82       4627.90       1.39x
BenchmarkEncodeS2Block/10-pb/block-better-32               13151.91      21850.98      1.66x
BenchmarkEncodeS2Block/11-gaviota/block-better-32          5872.67       8788.25       1.50x
BenchmarkEncodeS2Block/12-txt1_128b/block-better-32        410.38        7791.86       18.99x
BenchmarkEncodeS2Block/13-txt1_1000b/block-better-32       1729.19       7370.56       4.26x
BenchmarkEncodeS2Block/14-txt1_10000b/block-better-32      3050.66       7736.81       2.54x
BenchmarkEncodeS2Block/15-txt1_20000b/block-better-32      3091.47       5235.17       1.69x

Streams, With/without assembly, 16 cores:

github-june-2days-2019.json:
Compressing... 6273951764 -> 949146808 [15.13%]; 564ms, 10608.7MB/s
Compressing... 6273951764 -> 950079555 [15.14%]; 722ms, 8287.1MB/s

github-ranks-backup.bin:
Compressing... 1862623243 -> 555069246 [29.80%]; 261ms, 6805.8MB/s
Compressing... 1862623243 -> 555617002 [29.83%]; 384ms, 4625.9MB/s

enwik9:
Compressing... 1000000000 -> 426854233 [42.69%]; 229ms, 4164.5MB/s
Compressing... 1000000000 -> 427660256 [42.77%]; 333ms, 2863.9MB/s

nyc-taxi-data-10M.csv:
Compressing... 3325605752 -> 954776589 [28.71%]; 491ms, 6459.4MB/s
Compressing... 3325605752 -> 960330423 [28.88%]; 608ms, 5216.4MB/s

sharnd.out.2gb:
Compressing... 2147483647 -> 2147487753 [100.00%]; 174ms, 11770.0MB/s
Compressing... 2147483647 -> 2147487753 [100.00%]; 172ms, 11907.1MB/s

@klauspost klauspost marked this pull request as ready for review February 19, 2021 09:40
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant