-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Optimize insertion sort #40807
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Optimize insertion sort #40807
Conversation
This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently. Benchmark: name before ns/iter after ns/iter diff ns/iter diff % slice::sort_unstable_small_ascending 39 (2051 MB/s) 38 (2105 MB/s) -1 -2.56% slice::sort_unstable_small_big_random 579 (2210 MB/s) 575 (2226 MB/s) -4 -0.69% slice::sort_unstable_small_descending 80 (1000 MB/s) 70 (1142 MB/s) -10 -12.50% slice::sort_unstable_small_random 396 (202 MB/s) 386 -10 -2.53%
r? @aturon (rust_highfive has picked a reviewer for you, use r? to override) |
@bors: r+ |
📌 Commit 2c816f7 has been approved by |
…=alexcrichton Optimize insertion sort This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently. Benchmark: ``` name before ns/iter after ns/iter diff ns/iter diff % slice::sort_unstable_small_ascending 39 (2051 MB/s) 38 (2105 MB/s) -1 -2.56% slice::sort_unstable_small_big_random 579 (2210 MB/s) 575 (2226 MB/s) -4 -0.69% slice::sort_unstable_small_descending 80 (1000 MB/s) 70 (1142 MB/s) -10 -12.50% slice::sort_unstable_small_random 396 (202 MB/s) 386 -10 -2.53% ``` The benchmark is not a fluke. I can see that performance on `small_descending` is consistently better after this change. I'm not 100% sure why this makes things faster, but my guess would be that `v.len()+1` to the compiler looks like it could in theory overflow.
Since this is internal to libstd/core, could you check whether the inclusive range syntax makes things better as well? i.e. |
@nagisa Inclusive range syntax makes performance slightly worse, actually... With the old insertion sort and with inclusive range syntax there's a bound check at the beginning of insertion sort. This PR removes the bound check. If you want to play with this, here's a playpen link. |
⌛ Testing commit 2c816f7 with merge 04e47d7... |
…=alexcrichton Optimize insertion sort This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently. Benchmark: ``` name before ns/iter after ns/iter diff ns/iter diff % slice::sort_unstable_small_ascending 39 (2051 MB/s) 38 (2105 MB/s) -1 -2.56% slice::sort_unstable_small_big_random 579 (2210 MB/s) 575 (2226 MB/s) -4 -0.69% slice::sort_unstable_small_descending 80 (1000 MB/s) 70 (1142 MB/s) -10 -12.50% slice::sort_unstable_small_random 396 (202 MB/s) 386 -10 -2.53% ``` The benchmark is not a fluke. I can see that performance on `small_descending` is consistently better after this change. I'm not 100% sure why this makes things faster, but my guess would be that `v.len()+1` to the compiler looks like it could in theory overflow.
…=alexcrichton Optimize insertion sort This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently. Benchmark: ``` name before ns/iter after ns/iter diff ns/iter diff % slice::sort_unstable_small_ascending 39 (2051 MB/s) 38 (2105 MB/s) -1 -2.56% slice::sort_unstable_small_big_random 579 (2210 MB/s) 575 (2226 MB/s) -4 -0.69% slice::sort_unstable_small_descending 80 (1000 MB/s) 70 (1142 MB/s) -10 -12.50% slice::sort_unstable_small_random 396 (202 MB/s) 386 -10 -2.53% ``` The benchmark is not a fluke. I can see that performance on `small_descending` is consistently better after this change. I'm not 100% sure why this makes things faster, but my guess would be that `v.len()+1` to the compiler looks like it could in theory overflow.
This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently.
Benchmark:
The benchmark is not a fluke. I can see that performance on
small_descending
is consistently better after this change. I'm not 100% sure why this makes things faster, but my guess would be thatv.len()+1
to the compiler looks like it could in theory overflow.