Skip to content

Uniform sampling: use Canon's method #1287

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 10 commits into from
Mar 24, 2023
Merged

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Feb 17, 2023

Closes #570, #1145, #1154, #1196, #1286. See also #1172 (TODO: SIMD), #494 (here we add "unbiased" feature flag).

Also implements PartialEq for all our Uniform impls and Eq for all but FP. See #1217.


Yet another PR to finally update Uniform integer sampling (maybe):

  • Uses Canon's method (up to two RNG samples) for distribution and single sampling
  • Adds an "unbiased" feature flag, which instead uses Lemire's method for distributions and Canon's methods with unlimited samples for single-sampling

Based on canon-uniform-benches branch, revised
This is a small tweak unsupported by evidence, but brings
SIMD in line with unbiased integer range sampling.
Note: unbiased does pass current value-stability tests,
but could fail extra ones in the future.
@dhardy
Copy link
Member Author

dhardy commented Feb 17, 2023

Baseline results (new benchmark on master)

samplei8/SmallRng/single
time: [1.9184 ns 1.9187 ns 1.9190 ns]
Found 20223 outliers among 100000 measurements (20.22%)
1018 (1.02%) low severe
16 (0.02%) low mild
158 (0.16%) high mild
19031 (19.03%) high severe
Benchmarking samplei8/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.4s, enable flat sampling, or reduce sample count to 52520.
samplei8/SmallRng/distr time: [1.1100 ns 1.1103 ns 1.1107 ns]
Found 13145 outliers among 100000 measurements (13.14%)
5635 (5.63%) low severe
423 (0.42%) low mild
2051 (2.05%) high mild
5036 (5.04%) high severe
samplei8/ChaCha8Rng/single
time: [2.3286 ns 2.3305 ns 2.3324 ns]
Found 5234 outliers among 100000 measurements (5.23%)
1877 (1.88%) high mild
3357 (3.36%) high severe
samplei8/ChaCha8Rng/distr
time: [1.7106 ns 1.7107 ns 1.7109 ns]
Found 4087 outliers among 100000 measurements (4.09%)
444 (0.44%) low mild
2617 (2.62%) high mild
1026 (1.03%) high severe
samplei8/Pcg32/single time: [1.6857 ns 1.6865 ns 1.6873 ns]
Found 6777 outliers among 100000 measurements (6.78%)
510 (0.51%) high mild
6267 (6.27%) high severe
samplei8/Pcg32/distr time: [1.2538 ns 1.2539 ns 1.2540 ns]
Found 27845 outliers among 100000 measurements (27.84%)
22197 (22.20%) low severe
47 (0.05%) low mild
7 (0.01%) high mild
5594 (5.59%) high severe
samplei8/Pcg64/single time: [2.1280 ns 2.1290 ns 2.1301 ns]
Found 22971 outliers among 100000 measurements (22.97%)
13429 (13.43%) low severe
5441 (5.44%) low mild
558 (0.56%) high mild
3543 (3.54%) high severe
samplei8/Pcg64/distr time: [1.4348 ns 1.4349 ns 1.4350 ns]
Found 40705 outliers among 100000 measurements (40.70%)
16283 (16.28%) low severe
24422 (24.42%) high severe

samplei16/SmallRng/single
time: [1.9058 ns 1.9059 ns 1.9060 ns]
Found 2986 outliers among 100000 measurements (2.99%)
2986 (2.99%) high severe
Benchmarking samplei16/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.4s, enable flat sampling, or reduce sample count to 52870.
samplei16/SmallRng/distr
time: [1.0703 ns 1.0706 ns 1.0709 ns]
Found 986 outliers among 100000 measurements (0.99%)
444 (0.44%) high mild
542 (0.54%) high severe
samplei16/ChaCha8Rng/single
time: [2.0399 ns 2.0404 ns 2.0409 ns]
Found 4574 outliers among 100000 measurements (4.57%)
138 (0.14%) low mild
65 (0.07%) high mild
4371 (4.37%) high severe
samplei16/ChaCha8Rng/distr
time: [1.7870 ns 1.7874 ns 1.7877 ns]
Found 4997 outliers among 100000 measurements (5.00%)
1024 (1.02%) low mild
2289 (2.29%) high mild
1684 (1.68%) high severe
samplei16/Pcg32/single time: [1.6959 ns 1.6967 ns 1.6975 ns]
Found 2849 outliers among 100000 measurements (2.85%)
554 (0.55%) high mild
2295 (2.29%) high severe
samplei16/Pcg32/distr time: [1.2458 ns 1.2460 ns 1.2461 ns]
Found 4853 outliers among 100000 measurements (4.85%)
453 (0.45%) high mild
4400 (4.40%) high severe
samplei16/Pcg64/single time: [1.9074 ns 1.9076 ns 1.9078 ns]
Found 3419 outliers among 100000 measurements (3.42%)
14 (0.01%) low mild
1315 (1.31%) high mild
2090 (2.09%) high severe
samplei16/Pcg64/distr time: [1.4340 ns 1.4342 ns 1.4345 ns]
Found 34978 outliers among 100000 measurements (34.98%)
22432 (22.43%) low severe
12546 (12.55%) high severe

samplei32/SmallRng/single
time: [4.9445 ns 4.9550 ns 4.9655 ns]
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high severe
Benchmarking samplei32/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 50180.
samplei32/SmallRng/distr
time: [1.8612 ns 1.8700 ns 1.8791 ns]
Found 8951 outliers among 100000 measurements (8.95%)
5188 (5.19%) high mild
3763 (3.76%) high severe
samplei32/ChaCha8Rng/single
time: [5.8213 ns 5.8339 ns 5.8463 ns]
samplei32/ChaCha8Rng/distr
time: [2.3517 ns 2.3605 ns 2.3697 ns]
Found 8845 outliers among 100000 measurements (8.85%)
5780 (5.78%) high mild
3065 (3.06%) high severe
samplei32/Pcg32/single time: [4.7795 ns 4.7899 ns 4.8005 ns]
Found 3 outliers among 100000 measurements (0.00%)
3 (0.00%) high severe
samplei32/Pcg32/distr time: [2.0956 ns 2.1029 ns 2.1099 ns]
Found 9127 outliers among 100000 measurements (9.13%)
5384 (5.38%) high mild
3743 (3.74%) high severe
samplei32/Pcg64/single time: [5.4698 ns 5.4821 ns 5.4942 ns]
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei32/Pcg64/distr time: [2.5071 ns 2.5155 ns 2.5239 ns]
Found 8743 outliers among 100000 measurements (8.74%)
5745 (5.75%) high mild
2998 (3.00%) high severe

samplei64/SmallRng/single
time: [5.9268 ns 5.9361 ns 5.9454 ns]
Found 2 outliers among 100000 measurements (0.00%)
2 (0.00%) high mild
samplei64/SmallRng/distr
time: [1.7516 ns 1.7579 ns 1.7644 ns]
Found 9262 outliers among 100000 measurements (9.26%)
5356 (5.36%) high mild
3906 (3.91%) high severe
samplei64/ChaCha8Rng/single
time: [7.8579 ns 7.8709 ns 7.8840 ns]
Found 3 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
2 (0.00%) high severe
samplei64/ChaCha8Rng/distr
time: [3.5666 ns 3.5778 ns 3.5892 ns]
Found 8734 outliers among 100000 measurements (8.73%)
6057 (6.06%) high mild
2677 (2.68%) high severe
samplei64/Pcg32/single time: [7.1110 ns 7.1241 ns 7.1368 ns]
samplei64/Pcg32/distr time: [2.9155 ns 2.9241 ns 2.9327 ns]
Found 9162 outliers among 100000 measurements (9.16%)
5522 (5.52%) high mild
3640 (3.64%) high severe
samplei64/Pcg64/single time: [6.6004 ns 6.6123 ns 6.6247 ns]
Found 62 outliers among 100000 measurements (0.06%)
62 (0.06%) high mild
samplei64/Pcg64/distr time: [2.5028 ns 2.5110 ns 2.5195 ns]
Found 9183 outliers among 100000 measurements (9.18%)
4994 (4.99%) high mild
4189 (4.19%) high severe

samplei128/SmallRng/single
time: [11.482 ns 11.496 ns 11.510 ns]
Found 185 outliers among 100000 measurements (0.18%)
185 (0.18%) high mild
samplei128/SmallRng/distr
time: [5.6678 ns 5.6780 ns 5.6879 ns]
Found 8484 outliers among 100000 measurements (8.48%)
7430 (7.43%) high mild
1054 (1.05%) high severe
samplei128/ChaCha8Rng/single
time: [14.400 ns 14.419 ns 14.439 ns]
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei128/ChaCha8Rng/distr
time: [7.8090 ns 7.8233 ns 7.8374 ns]
Found 8322 outliers among 100000 measurements (8.32%)
6595 (6.59%) high mild
1727 (1.73%) high severe
samplei128/Pcg32/single time: [13.412 ns 13.430 ns 13.448 ns]
Found 15 outliers among 100000 measurements (0.01%)
15 (0.01%) high mild
samplei128/Pcg32/distr time: [7.1153 ns 7.1296 ns 7.1429 ns]
Found 8659 outliers among 100000 measurements (8.66%)
6130 (6.13%) high mild
2529 (2.53%) high severe
samplei128/Pcg64/single time: [12.365 ns 12.382 ns 12.399 ns]
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei128/Pcg64/distr time: [6.5060 ns 6.5183 ns 6.5302 ns]
Found 8991 outliers among 100000 measurements (8.99%)
5736 (5.74%) high mild
3255 (3.25%) high severe

New results (compared to baseline)

samplei8/SmallRng/single
time: [1.4978 ns 1.4982 ns 1.4986 ns]
change: [-21.946% -21.917% -21.892%] (p = 0.00 < 0.05)
Performance has improved.
Found 17805 outliers among 100000 measurements (17.80%)
1817 (1.82%) low severe
18 (0.02%) low mild
530 (0.53%) high mild
15440 (15.44%) high severe
samplei8/SmallRng/distr time: [1.8966 ns 1.8971 ns 1.8977 ns]
change: [+70.464% +70.566% +70.650%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15393 outliers among 100000 measurements (15.39%)
10589 (10.59%) low severe
33 (0.03%) low mild
6 (0.01%) high mild
4765 (4.76%) high severe
samplei8/ChaCha8Rng/single
time: [2.0854 ns 2.0858 ns 2.0862 ns]
change: [-10.575% -10.497% -10.423%] (p = 0.00 < 0.05)
Performance has improved.
Found 593 outliers among 100000 measurements (0.59%)
4 (0.00%) low mild
474 (0.47%) high mild
115 (0.12%) high severe
samplei8/ChaCha8Rng/distr
time: [2.6350 ns 2.6357 ns 2.6364 ns]
change: [+54.015% +54.066% +54.109%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16017 outliers among 100000 measurements (16.02%)
1602 (1.60%) low mild
10054 (10.05%) high mild
4361 (4.36%) high severe
samplei8/Pcg32/single time: [1.4994 ns 1.5000 ns 1.5005 ns]
change: [-11.112% -11.060% -11.008%] (p = 0.00 < 0.05)
Performance has improved.
Found 34661 outliers among 100000 measurements (34.66%)
1486 (1.49%) low severe
20299 (20.30%) low mild
68 (0.07%) high mild
12808 (12.81%) high severe
Benchmarking samplei8/Pcg32/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 53100.
samplei8/Pcg32/distr time: [1.0578 ns 1.0580 ns 1.0582 ns]
change: [-15.545% -15.496% -15.430%] (p = 0.00 < 0.05)
Performance has improved.
Found 25037 outliers among 100000 measurements (25.04%)
5278 (5.28%) low severe
113 (0.11%) low mild
2061 (2.06%) high mild
17585 (17.59%) high severe
samplei8/Pcg64/single time: [1.9000 ns 1.9004 ns 1.9009 ns]
change: [-10.787% -10.738% -10.688%] (p = 0.00 < 0.05)
Performance has improved.
Found 4558 outliers among 100000 measurements (4.56%)
343 (0.34%) high mild
4215 (4.21%) high severe
samplei8/Pcg64/distr time: [1.4648 ns 1.4649 ns 1.4651 ns]
change: [+2.0833% +2.0954% +2.1074%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12686 outliers among 100000 measurements (12.69%)
7611 (7.61%) low severe
50 (0.05%) low mild
10 (0.01%) high mild
5015 (5.01%) high severe

samplei16/SmallRng/single
time: [1.6958 ns 1.6961 ns 1.6963 ns]
change: [-11.022% -11.008% -10.994%] (p = 0.00 < 0.05)
Performance has improved.
Found 3109 outliers among 100000 measurements (3.11%)
2 (0.00%) high mild
3107 (3.11%) high severe
samplei16/SmallRng/distr
time: [1.9426 ns 1.9432 ns 1.9437 ns]
change: [+81.022% +81.160% +81.268%] (p = 0.00 < 0.05)
Performance has regressed.
Found 30150 outliers among 100000 measurements (30.15%)
12940 (12.94%) low severe
17210 (17.21%) high severe
samplei16/ChaCha8Rng/single
time: [1.8547 ns 1.8553 ns 1.8559 ns]
change: [-9.1088% -9.0724% -9.0349%] (p = 0.00 < 0.05)
Performance has improved.
Found 14926 outliers among 100000 measurements (14.93%)
8618 (8.62%) low mild
2771 (2.77%) high mild
3537 (3.54%) high severe
samplei16/ChaCha8Rng/distr
time: [2.6924 ns 2.6935 ns 2.6947 ns]
change: [+50.625% +50.699% +50.754%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4189 outliers among 100000 measurements (4.19%)
974 (0.97%) high mild
3215 (3.21%) high severe
samplei16/Pcg32/single time: [1.5690 ns 1.5693 ns 1.5696 ns]
change: [-7.5591% -7.5113% -7.4643%] (p = 0.00 < 0.05)
Performance has improved.
Found 10941 outliers among 100000 measurements (10.94%)
1206 (1.21%) low severe
15 (0.01%) low mild
91 (0.09%) high mild
9629 (9.63%) high severe
Benchmarking samplei16/Pcg32/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 53730.
samplei16/Pcg32/distr time: [1.0367 ns 1.0369 ns 1.0370 ns]
change: [-16.514% -16.471% -16.428%] (p = 0.00 < 0.05)
Performance has improved.
Found 7284 outliers among 100000 measurements (7.28%)
9 (0.01%) low severe
2537 (2.54%) high mild
4738 (4.74%) high severe
samplei16/Pcg64/single time: [1.7850 ns 1.7854 ns 1.7857 ns]
change: [-6.4245% -6.4057% -6.3846%] (p = 0.00 < 0.05)
Performance has improved.
Found 5112 outliers among 100000 measurements (5.11%)
38 (0.04%) high mild
5074 (5.07%) high severe
samplei16/Pcg64/distr time: [1.4695 ns 1.4695 ns 1.4696 ns]
change: [+2.4432% +2.4613% +2.4796%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3202 outliers among 100000 measurements (3.20%)
12 (0.01%) high mild
3190 (3.19%) high severe

samplei32/SmallRng/single
time: [2.9643 ns 2.9717 ns 2.9790 ns]
change: [-40.218% -40.027% -39.812%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei32/SmallRng/distr
time: [2.1418 ns 2.1468 ns 2.1520 ns]
change: [+14.158% +14.625% +15.127%] (p = 0.00 < 0.05)
Performance has regressed.
samplei32/ChaCha8Rng/single
time: [3.4354 ns 3.4436 ns 3.4519 ns]
change: [-41.166% -40.973% -40.797%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/ChaCha8Rng/distr
time: [4.3137 ns 4.3205 ns 4.3273 ns]
change: [+82.252% +83.029% +83.739%] (p = 0.00 < 0.05)
Performance has regressed.
Found 17 outliers among 100000 measurements (0.02%)
17 (0.02%) high mild
samplei32/Pcg32/single time: [2.8065 ns 2.8139 ns 2.8213 ns]
change: [-41.459% -41.254% -41.074%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/Pcg32/distr time: [2.2714 ns 2.2769 ns 2.2826 ns]
change: [+7.8178% +8.2767% +8.7563%] (p = 0.00 < 0.05)
Performance has regressed.
samplei32/Pcg64/single time: [3.5373 ns 3.5460 ns 3.5546 ns]
change: [-35.538% -35.317% -35.115%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/Pcg64/distr time: [2.8174 ns 2.8238 ns 2.8304 ns]
change: [+11.798% +12.255% +12.692%] (p = 0.00 < 0.05)
Performance has regressed.

samplei64/SmallRng/single
time: [4.4161 ns 4.4223 ns 4.4285 ns]
change: [-25.650% -25.501% -25.338%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100000 measurements (0.01%)
8 (0.01%) high mild
samplei64/SmallRng/distr
time: [1.9359 ns 1.9407 ns 1.9454 ns]
change: [+9.9079% +10.397% +10.890%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high severe
samplei64/ChaCha8Rng/single
time: [5.7185 ns 5.7264 ns 5.7344 ns]
change: [-27.418% -27.246% -27.100%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei64/ChaCha8Rng/distr
time: [4.0937 ns 4.1020 ns 4.1103 ns]
change: [+14.257% +14.652% +15.085%] (p = 0.00 < 0.05)
Performance has regressed.
samplei64/Pcg32/single time: [4.9361 ns 4.9447 ns 4.9533 ns]
change: [-30.763% -30.592% -30.414%] (p = 0.00 < 0.05)
Performance has improved.
Found 48 outliers among 100000 measurements (0.05%)
44 (0.04%) high mild
4 (0.00%) high severe
samplei64/Pcg32/distr time: [3.3706 ns 3.3777 ns 3.3847 ns]
change: [+15.104% +15.511% +15.898%] (p = 0.00 < 0.05)
Performance has regressed.
samplei64/Pcg64/single time: [4.7009 ns 4.7079 ns 4.7150 ns]
change: [-28.960% -28.801% -28.651%] (p = 0.00 < 0.05)
Performance has improved.
samplei64/Pcg64/distr time: [2.8317 ns 2.8380 ns 2.8442 ns]
change: [+12.584% +13.020% +13.467%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100000 measurements (0.00%)
3 (0.00%) high severe

samplei128/SmallRng/single
time: [9.6697 ns 9.6778 ns 9.6860 ns]
change: [-15.933% -15.813% -15.695%] (p = 0.00 < 0.05)
Performance has improved.
Found 20 outliers among 100000 measurements (0.02%)
20 (0.02%) high mild
samplei128/SmallRng/distr
time: [6.7277 ns 6.7370 ns 6.7460 ns]
change: [+18.371% +18.650% +18.908%] (p = 0.00 < 0.05)
Performance has regressed.
Found 95 outliers among 100000 measurements (0.10%)
95 (0.10%) high mild
samplei128/ChaCha8Rng/single
time: [12.092 ns 12.107 ns 12.121 ns]
change: [-16.186% -16.036% -15.883%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100000 measurements (0.01%)
3 (0.00%) high mild
2 (0.00%) high severe
samplei128/ChaCha8Rng/distr
time: [8.8107 ns 8.8237 ns 8.8367 ns]
change: [+12.524% +12.787% +13.051%] (p = 0.00 < 0.05)
Performance has regressed.
samplei128/Pcg32/single time: [10.177 ns 10.190 ns 10.203 ns]
change: [-24.246% -24.126% -23.979%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei128/Pcg32/distr time: [8.3067 ns 8.3188 ns 8.3311 ns]
change: [+16.386% +16.681% +16.965%] (p = 0.00 < 0.05)
Performance has regressed.
Found 100 outliers among 100000 measurements (0.10%)
100 (0.10%) high mild
samplei128/Pcg64/single time: [10.039 ns 10.047 ns 10.056 ns]
change: [-18.984% -18.858% -18.732%] (p = 0.00 < 0.05)
Performance has improved.
Found 47 outliers among 100000 measurements (0.05%)
47 (0.05%) high mild
samplei128/Pcg64/distr time: [7.0425 ns 7.0534 ns 7.0643 ns]
change: [+7.9572% +8.2093% +8.4552%] (p = 0.00 < 0.05)
Performance has regressed.
Found 19 outliers among 100000 measurements (0.02%)
18 (0.02%) high mild
1 (0.00%) high severe

Looks like a decent improvement for single-sampling, but considerably worse for distribution sampling.
New results (unbiased feature)

samplei8/SmallRng/single
time: [1.9076 ns 1.9082 ns 1.9089 ns]
change: [-0.5836% -0.5448% -0.5079%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 22388 outliers among 100000 measurements (22.39%)
18016 (18.02%) low severe
44 (0.04%) low mild
22 (0.02%) high mild
4306 (4.31%) high severe
Benchmarking samplei8/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.6s, enable flat sampling, or reduce sample count to 51640.
samplei8/SmallRng/distr time: [1.1186 ns 1.1188 ns 1.1189 ns]
change: [+1.0264% +1.1032% +1.1788%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7570 outliers among 100000 measurements (7.57%)
186 (0.19%) low severe
3 (0.00%) low mild
2649 (2.65%) high mild
4732 (4.73%) high severe
samplei8/ChaCha8Rng/single
time: [2.0471 ns 2.0477 ns 2.0483 ns]
change: [-12.214% -12.133% -12.059%] (p = 0.00 < 0.05)
Performance has improved.
Found 14935 outliers among 100000 measurements (14.94%)
2 (0.00%) low severe
642 (0.64%) low mild
5812 (5.81%) high mild
8479 (8.48%) high severe
samplei8/ChaCha8Rng/distr
time: [1.7169 ns 1.7173 ns 1.7176 ns]
change: [+0.3602% +0.3809% +0.4034%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4558 outliers among 100000 measurements (4.56%)
138 (0.14%) low mild
2210 (2.21%) high mild
2210 (2.21%) high severe
samplei8/Pcg32/single time: [1.7169 ns 1.7175 ns 1.7181 ns]
change: [+1.7780% +1.8401% +1.8967%] (p = 0.00 < 0.05)
Performance has regressed.
Found 30316 outliers among 100000 measurements (30.32%)
6836 (6.84%) low mild
920 (0.92%) high mild
22560 (22.56%) high severe
samplei8/Pcg32/distr time: [1.2623 ns 1.2624 ns 1.2626 ns]
change: [+0.6649% +0.6766% +0.6904%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 27891 outliers among 100000 measurements (27.89%)
19384 (19.38%) low severe
75 (0.07%) low mild
7 (0.01%) high mild
8425 (8.43%) high severe
samplei8/Pcg64/single time: [2.0078 ns 2.0088 ns 2.0097 ns]
change: [-5.7091% -5.6495% -5.5883%] (p = 0.00 < 0.05)
Performance has improved.
Found 16243 outliers among 100000 measurements (16.24%)
8554 (8.55%) high mild
7689 (7.69%) high severe
samplei8/Pcg64/distr time: [1.4221 ns 1.4222 ns 1.4224 ns]
change: [-0.8919% -0.8796% -0.8670%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6857 outliers among 100000 measurements (6.86%)
17 (0.02%) low severe
3 (0.00%) high mild
6837 (6.84%) high severe

samplei16/SmallRng/single
time: [1.7045 ns 1.7048 ns 1.7052 ns]
change: [-10.573% -10.550% -10.531%] (p = 0.00 < 0.05)
Performance has improved.
Found 45291 outliers among 100000 measurements (45.29%)
20430 (20.43%) low severe
24861 (24.86%) high severe
Benchmarking samplei16/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 52320.
samplei16/SmallRng/distr
time: [1.0812 ns 1.0815 ns 1.0819 ns]
change: [+0.5067% +0.6014% +0.6705%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1152 outliers among 100000 measurements (1.15%)
290 (0.29%) high mild
862 (0.86%) high severe
samplei16/ChaCha8Rng/single
time: [1.9882 ns 1.9886 ns 1.9890 ns]
change: [-2.5715% -2.5408% -2.5091%] (p = 0.00 < 0.05)
Performance has improved.
Found 8702 outliers among 100000 measurements (8.70%)
301 (0.30%) low mild
4494 (4.49%) high mild
3907 (3.91%) high severe
samplei16/ChaCha8Rng/distr
time: [1.7356 ns 1.7360 ns 1.7363 ns]
change: [-2.8998% -2.8740% -2.8467%] (p = 0.00 < 0.05)
Performance has improved.
Found 13632 outliers among 100000 measurements (13.63%)
2 (0.00%) low mild
12538 (12.54%) high mild
1092 (1.09%) high severe
samplei16/Pcg32/single time: [1.5160 ns 1.5162 ns 1.5164 ns]
change: [-10.682% -10.638% -10.594%] (p = 0.00 < 0.05)
Performance has improved.
Found 26466 outliers among 100000 measurements (26.47%)
13076 (13.08%) low severe
217 (0.22%) low mild
229 (0.23%) high mild
12944 (12.94%) high severe
samplei16/Pcg32/distr time: [1.2681 ns 1.2684 ns 1.2686 ns]
change: [+1.7735% +1.7986% +1.8233%] (p = 0.00 < 0.05)
Performance has regressed.
Found 23148 outliers among 100000 measurements (23.15%)
208 (0.21%) low severe
1 (0.00%) low mild
84 (0.08%) high mild
22855 (22.86%) high severe
samplei16/Pcg64/single time: [1.9071 ns 1.9077 ns 1.9083 ns]
change: [-0.0228% +0.0049% +0.0417%] (p = 0.74 > 0.05)
No change in performance detected.
Found 36106 outliers among 100000 measurements (36.11%)
18848 (18.85%) low severe
1174 (1.17%) low mild
88 (0.09%) high mild
15996 (16.00%) high severe
samplei16/Pcg64/distr time: [1.4509 ns 1.4511 ns 1.4514 ns]
change: [+1.1517% +1.1772% +1.2000%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18304 outliers among 100000 measurements (18.30%)
5057 (5.06%) low severe
13 (0.01%) low mild
21 (0.02%) high mild
13213 (13.21%) high severe

samplei32/SmallRng/single
time: [3.6718 ns 3.6817 ns 3.6919 ns]
change: [-25.935% -25.697% -25.453%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/SmallRng/distr
time: [1.8915 ns 1.8981 ns 1.9048 ns]
change: [+0.8519% +1.3457% +1.7827%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8712 outliers among 100000 measurements (8.71%)
5349 (5.35%) high mild
3363 (3.36%) high severe
samplei32/ChaCha8Rng/single
time: [4.3550 ns 4.3663 ns 4.3772 ns]
change: [-25.412% -25.156% -24.906%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100000 measurements (0.00%)
4 (0.00%) high mild
samplei32/ChaCha8Rng/distr
time: [2.7254 ns 2.7344 ns 2.7433 ns]
change: [+15.251% +15.837% +16.393%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8503 outliers among 100000 measurements (8.50%)
6055 (6.05%) high mild
2448 (2.45%) high severe
samplei32/Pcg32/single time: [3.6329 ns 3.6435 ns 3.6541 ns]
change: [-24.192% -23.935% -23.661%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/Pcg32/distr time: [2.1075 ns 2.1145 ns 2.1215 ns]
change: [+0.0819% +0.5507% +1.0484%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 9153 outliers among 100000 measurements (9.15%)
5330 (5.33%) high mild
3823 (3.82%) high severe
samplei32/Pcg64/single time: [4.2954 ns 4.3080 ns 4.3205 ns]
change: [-21.711% -21.416% -21.098%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/Pcg64/distr time: [2.4684 ns 2.4771 ns 2.4858 ns]
change: [-1.9894% -1.5294% -1.0481%] (p = 0.00 < 0.05)
Performance has improved.
Found 9636 outliers among 100000 measurements (9.64%)
5199 (5.20%) high mild
4437 (4.44%) high severe

samplei64/SmallRng/single
time: [5.2532 ns 5.2629 ns 5.2723 ns]
change: [-11.567% -11.342% -11.128%] (p = 0.00 < 0.05)
Performance has improved.
samplei64/SmallRng/distr
time: [1.7670 ns 1.7737 ns 1.7806 ns]
change: [+0.3709% +0.8994% +1.5000%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9659 outliers among 100000 measurements (9.66%)
5357 (5.36%) high mild
4302 (4.30%) high severe
samplei64/ChaCha8Rng/single
time: [6.7406 ns 6.7520 ns 6.7634 ns]
change: [-14.394% -14.216% -14.015%] (p = 0.00 < 0.05)
Performance has improved.
samplei64/ChaCha8Rng/distr
time: [3.5487 ns 3.5596 ns 3.5705 ns]
change: [-0.9413% -0.5079% -0.0649%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 8781 outliers among 100000 measurements (8.78%)
5848 (5.85%) high mild
2933 (2.93%) high severe
samplei64/Pcg32/single time: [5.7866 ns 5.7992 ns 5.8118 ns]
change: [-18.870% -18.598% -18.380%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei64/Pcg32/distr time: [3.2564 ns 3.2654 ns 3.2744 ns]
change: [+11.243% +11.670% +12.117%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8838 outliers among 100000 measurements (8.84%)
5754 (5.75%) high mild
3084 (3.08%) high severe
samplei64/Pcg64/single time: [5.6715 ns 5.6825 ns 5.6933 ns]
change: [-14.281% -14.061% -13.872%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high severe
samplei64/Pcg64/distr time: [2.5092 ns 2.5177 ns 2.5262 ns]
change: [-0.1602% +0.2676% +0.7783%] (p = 0.26 > 0.05)
No change in performance detected.
Found 9574 outliers among 100000 measurements (9.57%)
5084 (5.08%) high mild
4490 (4.49%) high severe

samplei128/SmallRng/single
time: [10.410 ns 10.423 ns 10.436 ns]
change: [-9.4864% -9.3321% -9.1484%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100000 measurements (0.01%)
10 (0.01%) high mild
samplei128/SmallRng/distr
time: [5.6088 ns 5.6180 ns 5.6275 ns]
change: [-1.2694% -1.0568% -0.8129%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8739 outliers among 100000 measurements (8.74%)
5882 (5.88%) high mild
2857 (2.86%) high severe
samplei128/ChaCha8Rng/single
time: [12.332 ns 12.349 ns 12.366 ns]
change: [-14.509% -14.355% -14.176%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100000 measurements (0.01%)
13 (0.01%) high mild
samplei128/ChaCha8Rng/distr
time: [7.9089 ns 7.9233 ns 7.9378 ns]
change: [+0.9915% +1.2787% +1.5530%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8295 outliers among 100000 measurements (8.29%)
6697 (6.70%) high mild
1598 (1.60%) high severe
samplei128/Pcg32/single time: [11.405 ns 11.422 ns 11.440 ns]
change: [-15.114% -14.949% -14.783%] (p = 0.00 < 0.05)
Performance has improved.
samplei128/Pcg32/distr time: [7.4301 ns 7.4443 ns 7.4582 ns]
change: [+4.1310% +4.4150% +4.7024%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8602 outliers among 100000 measurements (8.60%)
6161 (6.16%) high mild
2441 (2.44%) high severe
samplei128/Pcg64/single time: [10.496 ns 10.511 ns 10.526 ns]
change: [-15.273% -15.113% -14.953%] (p = 0.00 < 0.05)
Performance has improved.
samplei128/Pcg64/distr time: [6.3761 ns 6.3885 ns 6.4010 ns]
change: [-2.2301% -1.9907% -1.7150%] (p = 0.00 < 0.05)
Performance has improved.
Found 8753 outliers among 100000 measurements (8.75%)
5805 (5.80%) high mild
2948 (2.95%) high severe

These look not-quite-as-good for single-sampling (but still an improvement), and significantly better for distribution sampling...

...I hate micro-benchmarking (see results in #1286). Looks like we should just use Lemire's method for distribution sampling in all cases.

@vks
Copy link
Collaborator

vks commented Feb 17, 2023

Looks like we should just use Lemire's method for distribution sampling in all cases.

Agreed, especially if it is less biased.

@dhardy
Copy link
Member Author

dhardy commented Feb 18, 2023

Bench re-runs (lower clock speed, better formatted): results.ods Highlights:

      biased vs base unbiased vs base unbiased vs biased
samplei8 ChaCha8Rng distr 57.00% 0.10% -36.30%
samplei8 Pcg32 distr -16.70% 0.50% 20.60%
samplei8 Pcg64 distr 2.10% 0.10% -2.00%
samplei8 SmallRng distr 70.60% 1.60% -40.50%
samplei16 ChaCha8Rng distr 49.00% -0.20% -33.00%
samplei16 Pcg32 distr -16.60% 0.00% 20.00%
samplei16 Pcg64 distr 2.40% 0.30% -2.10%
samplei16 SmallRng distr 74.20% -2.90% -44.30%
samplei32 ChaCha8Rng distr 80.80% 13.50% -37.20%
samplei32 Pcg32 distr 8.20% 0.30% -7.30%
samplei32 Pcg64 distr 11.60% -0.80% -11.10%
samplei32 SmallRng distr 12.50% -0.10% -11.20%
samplei64 ChaCha8Rng distr 13.30% -0.50% -12.20%
samplei64 Pcg32 distr 14.60% 11.30% -2.90%
samplei64 Pcg64 distr 12.40% -0.40% -11.40%
samplei64 SmallRng distr 11.70% 0.40% -10.10%
samplei128 ChaCha8Rng distr 12.50% -1.40% -12.30%
samplei128 Pcg32 distr 14.40% 2.00% -10.90%
samplei128 Pcg64 distr 9.30% -2.10% -10.40%
samplei128 SmallRng distr 20.00% -0.80% -17.30%
samplei8 ChaCha8Rng single -9.70% -11.10% -1.60%
samplei8 Pcg32 single -12.80% -0.10% 14.60%
samplei8 Pcg64 single -8.80% -5.60% 3.60%
samplei8 SmallRng single -21.50% 0.00% 27.40%
samplei16 ChaCha8Rng single -9.40% -0.80% 9.50%
samplei16 Pcg32 single -7.30% -12.10% -5.20%
samplei16 Pcg64 single -7.80% -1.80% 6.50%
samplei16 SmallRng single -11.00% -11.10% -0.10%
samplei32 ChaCha8Rng single -41.70% -24.90% 28.90%
samplei32 Pcg32 single -42.20% -25.30% 29.20%
samplei32 Pcg64 single -36.50% -22.20% 22.60%
samplei32 SmallRng single -40.20% -25.10% 25.30%
samplei64 ChaCha8Rng single -26.70% -12.70% 19.00%
samplei64 Pcg32 single -29.50% -18.00% 16.30%
samplei64 Pcg64 single -28.80% -15.60% 18.60%
samplei64 SmallRng single -26.20% -12.40% 18.80%
samplei128 ChaCha8Rng single -17.50% -14.80% 3.20%
samplei128 Pcg32 single -25.50% -16.60% 11.90%
samplei128 Pcg64 single -20.40% -16.00% 5.60%
samplei128 SmallRng single -14.60% -9.90% 5.50%

So, yes, this supports the idea that we should always use Lemire's method for distribution sampling.

@dhardy
Copy link
Member Author

dhardy commented Feb 18, 2023

Remaining question: whether to keep both biased and unbiased options for single-sampling (using a feature flag). See #494. I am inclined to keep this under the following conditions:

  • Biased is the default (otherwise it is an optimisation that will likely get little use, so why bother).
  • Only the default option is tested by value-stability tests. (Currently achieved by only build-testing with "unbiased" enabled.)

There is not a strong rationale for this however, we could reduce to just one implementation (either).

@dhardy dhardy marked this pull request as ready for review February 20, 2023 10:07
@dhardy
Copy link
Member Author

dhardy commented Feb 20, 2023

I'm inclined to merge this as-is. Review please, maybe @TheIronBorn or @vks?

@dhardy
Copy link
Member Author

dhardy commented Feb 21, 2023

Thanks @TheIronBorn. Updated.

@dhardy
Copy link
Member Author

dhardy commented Mar 23, 2023

I'd like to merge this but am still waiting for a reviewer to approve (policy requires review is not by the author). @TheIronBorn you last reviewed this; would you mind revisiting?

@dhardy dhardy merged commit 22d0756 into rust-random:master Mar 24, 2023
benjamin-lieser pushed a commit to benjamin-lieser/rand that referenced this pull request Feb 5, 2025
Also:

* Add uniform distribution benchmarks
* Add "unbiased" feature flag
* Fix feature simd_support
* Uniform: impl PartialEq, Eq where possible
* CI: benches now require small_rng; build-test unbiased
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New publications on integer range sampling and shuffling
3 participants