Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve benchmarks #12

Merged
merged 2 commits into from
Jul 23, 2020
Merged

Improve benchmarks #12

merged 2 commits into from
Jul 23, 2020

Conversation

milesgranger
Copy link
Owner

Make benchmarks more comprehensive as suggested in BurntSushi/rust-snappy#34

Copy link

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@milesgranger milesgranger merged commit 92e2814 into master Jul 23, 2020
@milesgranger milesgranger deleted the improve-benchmarks branch July 23, 2020 13:56
@milesgranger milesgranger mentioned this pull request Jul 23, 2020
@martindurant
Copy link

Update these benchmarks perhaps? Some time has passed, and c-snappy won't have changed, but I bet there's a chance that rust-snappy (snap) has, or that there's a faster, newer alternative.

@milesgranger
Copy link
Owner Author

Hey @martindurant!

Thanks for the push, looking back into it now, seems like (one of?) the most efficient ways to pass back the bytes from Rust is to use PyBytes::new_with and there, calculate the (de)compressed sizes, thus avoiding a double allocation. One for the (de)compression portion and another converting it to PyBytes for Python transferal.

After prototyping with that a bit, it seems like cramjam could be reliably faster than python-snappy, less a few cases:

Output from this evening's session.

--------------------------------------------------------------------------------------------------------- benchmark: 24 tests ----------------------------------------------------------------------------------------------------------
Name (time in us)                                             Min                   Max                  Mean             StdDev                Median                IQR            Outliers          OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_raw[Mark.Twain-Tom.Sawyer.txt-cramjam]        66.7232 (4.42)       223.8648 (1.51)        69.4540 (4.30)      6.1899 (2.01)        68.0690 (4.38)      0.7018 (3.47)      405;862  14,398.0150 (0.23)       6365           1
test_snappy_raw[Mark.Twain-Tom.Sawyer.txt-snappy]         52.7422 (3.50)       250.5509 (1.69)        54.7293 (3.39)      5.2059 (1.69)        53.5757 (3.45)      0.4624 (2.29)     827;2155  18,271.7505 (0.30)      14479           1
test_snappy_raw[alice29.txt-cramjam]                     291.6441 (19.33)      443.5740 (2.98)       299.0047 (18.52)    13.9264 (4.52)       293.5077 (18.88)     2.9434 (14.56)     396;745   3,344.4289 (0.05)       3267           1
test_snappy_raw[alice29.txt-snappy]                      599.9287 (39.77)      842.8260 (5.67)       612.4785 (37.93)    19.3221 (6.28)       603.6360 (38.83)    19.1506 (94.76)      157;57   1,632.7105 (0.03)       1532           1
test_snappy_raw[asyoulik.txt-cramjam]                    308.9788 (20.48)      579.4521 (3.90)       319.7238 (19.80)    24.1271 (7.84)       311.3754 (20.03)     8.2855 (41.00)     206;371   3,127.7000 (0.05)       2978           1
test_snappy_raw[asyoulik.txt-snappy]                     532.7296 (35.31)      931.2029 (6.27)       548.0748 (33.94)    35.2371 (11.45)      535.6853 (34.46)    16.2870 (80.59)     110;140   1,824.5686 (0.03)       1838           1
test_snappy_raw[fireworks.jpeg-cramjam]                   40.8231 (2.71)       405.6497 (2.73)        42.1765 (2.61)      4.8386 (1.57)        41.3642 (2.66)      0.5813 (2.88)     821;1427  23,709.8683 (0.38)      18561           1
test_snappy_raw[fireworks.jpeg-snappy]                    15.0851 (1.0)        229.4863 (1.54)        16.1488 (1.0)       3.0783 (1.0)         15.5442 (1.0)       0.4978 (2.46)    1578;2553  61,923.9929 (1.0)       36309           1
test_snappy_raw[geo.protodata-cramjam]                   106.9540 (7.09)       283.0932 (1.90)       110.6508 (6.85)      7.8300 (2.54)       108.2791 (6.97)      0.8768 (4.34)     631;1820   9,037.4441 (0.15)       7890           1
test_snappy_raw[geo.protodata-snappy]                    143.0362 (9.48)       510.1310 (3.43)       148.3856 (9.19)     10.6992 (3.48)       146.0779 (9.40)      2.3735 (11.74)     462;820   6,739.2005 (0.11)       5858           1
test_snappy_raw[html-cramjam]                            145.9508 (9.68)       511.5601 (3.44)       150.0290 (9.29)     10.3035 (3.35)       147.0926 (9.46)      0.7492 (3.71)     457;1320   6,665.3774 (0.11)       6255           1
test_snappy_raw[html-snappy]                             156.4212 (10.37)      331.1597 (2.23)       161.2253 (9.98)      9.9066 (3.22)       157.9481 (10.16)     0.9718 (4.81)     487;1283   6,202.4992 (0.10)       5832           1
test_snappy_raw[html_x_4-cramjam]                        156.4962 (10.37)      502.4560 (3.38)       161.3468 (9.99)     11.1545 (3.62)       158.2210 (10.18)     0.8270 (4.09)     447;1083   6,197.8281 (0.10)       5745           1
test_snappy_raw[html_x_4-snappy]                         634.0877 (42.03)      831.9281 (5.60)       649.9287 (40.25)    22.6907 (7.37)       639.3418 (41.13)    19.1698 (94.85)      187;84   1,538.6304 (0.02)       1541           1
test_snappy_raw[kppkn.gtb-cramjam]                       201.9522 (13.39)      320.4923 (2.16)       207.1594 (12.83)    10.3099 (3.35)       204.0809 (13.13)     1.4184 (7.02)      402;769   4,827.2011 (0.08)       4540           1
test_snappy_raw[kppkn.gtb-snappy]                        503.6397 (33.39)      713.1672 (4.80)       515.4781 (31.92)    20.4209 (6.63)       506.2551 (32.57)    15.6360 (77.37)     202;129   1,939.9465 (0.03)       1876           1
test_snappy_raw[lcet10.txt-cramjam]                      282.5959 (18.73)      507.0879 (3.41)       289.8384 (17.95)    16.2355 (5.27)       284.2164 (18.28)     2.4006 (11.88)     243;701   3,450.1979 (0.06)       3359           1
test_snappy_raw[lcet10.txt-snappy]                     1,590.2971 (105.42)   2,046.9069 (13.77)    1,627.8576 (100.80)   47.7467 (15.51)    1,615.7541 (103.95)   42.9840 (212.69)      56;32     614.3044 (0.01)        588           1
test_snappy_raw[paper-100k.pdf-cramjam]                   46.5959 (3.09)       172.2910 (1.16)        48.9973 (3.03)      6.1246 (1.99)        47.1906 (3.04)      0.2806 (1.39)    1057;2378  20,409.2833 (0.33)      11432           1
test_snappy_raw[paper-100k.pdf-snappy]                    20.4099 (1.35)       148.6209 (1.0)         21.6680 (1.34)      3.3306 (1.08)        21.1489 (1.36)      0.2021 (1.0)      782;4809  46,151.0589 (0.75)      18799           1
test_snappy_raw[plrabn12.txt-cramjam]                    340.2256 (22.55)      509.3641 (3.43)       348.9451 (21.61)    16.8228 (5.46)       342.2776 (22.02)     3.9656 (19.62)     143;356   2,865.7807 (0.05)       1652           1
test_snappy_raw[plrabn12.txt-snappy]                   2,184.4772 (144.81)   2,704.6911 (18.20)    2,232.1360 (138.22)   57.8854 (18.80)    2,213.1621 (142.38)   60.5900 (299.81)      38;13     448.0014 (0.01)        414           1
test_snappy_raw[urls.10K-cramjam]                        224.4790 (14.88)      483.0160 (3.25)       230.8938 (14.30)    15.5295 (5.04)       226.2061 (14.55)     1.7162 (8.49)      129;332   4,330.9956 (0.07)       1640           1
test_snappy_raw[urls.10K-snappy]                       1,811.8913 (120.11)   2,210.5193 (14.87)    1,848.8149 (114.49)   48.7328 (15.83)    1,836.4950 (118.15)   41.7850 (206.76)      45;28     540.8870 (0.01)        451           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@martindurant
Copy link

That's what I wanted to hear! It is interesting that the comparison is so favourable in some cases, but still marginally worse in others. Still: I'm convinced, and the rest of the algorithms were already better. I'll try to get some help with the conda-forge recipe, and then fastparquet can finally ditch python-snappy. In fact, there will be an argument to archive python-snappy (which I also co-maintain) eventually.

@martindurant
Copy link

If you are happy with this change, it should be worth a release.

@martindurant
Copy link

ping @milesgranger , would love to know when you think you might include this and release it, and please update the benchmarks in https://github.com/milesgranger/pyrus-cramjam/blob/master/benchmarks/README.md when ready.

@milesgranger
Copy link
Owner Author

Hi, at least until after the weekend most likely. And there are still some details to flesh out. That was just some prototyping; there are some issues with the approach that may require a fair amount of refactoring. I'll make a PR and ping you when ready.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants