-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Specialize PartialOrd<A> for [A] where A: Ord
#39642
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
This way we can call `cmp` instead of `partial_cmp` in the loop, removing some burden of optimizing `Option`s away from the compiler. PR #39538 introduced a regression where sorting slices suddenly became slower, since `slice1.lt(slice2)` was much slower than `slice1.cmp(slice2) == Less`. This problem is now fixed. To verify, I benchmarked this simple program: ```rust fn main() { let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>(); v.sort(); } ``` Before this PR, it would take 0.95 sec, and now it takes 0.58 sec. I also tried changing the `is_less` lambda to use `cmp` and `partial_cmp`. Now all three versions (`lt`, `cmp`, `partial_cmp`) are equally performant for sorting slices - all of them take 0.58 sec on the benchmark.
src/libcore/slice.rs
Outdated
self.len().partial_cmp(&other.len()) | ||
} | ||
} | ||
|
||
impl SlicePartialOrd<u8> for [u8] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this impl not be extended to all A: Ord
so it would simply use Some(SliceOrd::compare(self, other))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion, thanks! I've updated the code.
src/libcore/slice.rs
Outdated
Some(SliceOrd::compare(self, other)) | ||
} | ||
} | ||
|
||
impl SlicePartialOrd<u8> for [u8] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this specialization could be removed now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, removed.
@bors: r+ |
📌 Commit a344c12 has been approved by |
…ichton Specialize `PartialOrd<A> for [A] where A: Ord` This way we can call `cmp` instead of `partial_cmp` in the loop, removing some burden of optimizing `Option`s away from the compiler. PR #39538 introduced a regression where sorting slices suddenly became slower, since `slice1.lt(slice2)` was much slower than `slice1.cmp(slice2) == Less`. This problem is now fixed. To verify, I benchmarked this simple program: ```rust fn main() { let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>(); v.sort(); } ``` Before this PR, it would take 0.95 sec, and now it takes 0.58 sec. I also tried changing the `is_less` lambda to use `cmp` and `partial_cmp`. Now all three versions (`lt`, `cmp`, `partial_cmp`) are equally performant for sorting slices - all of them take 0.58 sec on the benchmark. Tangentially, as soon as we get `default impl`, it might be a good idea to implement a blanket default impl for `lt`, `gt`, `le`, `ge` in terms of `cmp` whenever possible. Today, those four functions by default are only implemented in terms of `partial_cmp`. r? @alexcrichton
…d, r=alexcrichton Specialize `PartialOrd<A> for [A] where A: Ord` This way we can call `cmp` instead of `partial_cmp` in the loop, removing some burden of optimizing `Option`s away from the compiler. PR rust-lang#39538 introduced a regression where sorting slices suddenly became slower, since `slice1.lt(slice2)` was much slower than `slice1.cmp(slice2) == Less`. This problem is now fixed. To verify, I benchmarked this simple program: ```rust fn main() { let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>(); v.sort(); } ``` Before this PR, it would take 0.95 sec, and now it takes 0.58 sec. I also tried changing the `is_less` lambda to use `cmp` and `partial_cmp`. Now all three versions (`lt`, `cmp`, `partial_cmp`) are equally performant for sorting slices - all of them take 0.58 sec on the benchmark. Tangentially, as soon as we get `default impl`, it might be a good idea to implement a blanket default impl for `lt`, `gt`, `le`, `ge` in terms of `cmp` whenever possible. Today, those four functions by default are only implemented in terms of `partial_cmp`. r? @alexcrichton
☀️ Test successful - status-appveyor, status-travis |
This way we can call
cmp
instead ofpartial_cmp
in the loop, removing some burden of optimizingOption
s away from the compiler.PR #39538 introduced a regression where sorting slices suddenly became slower, since
slice1.lt(slice2)
was much slower thanslice1.cmp(slice2) == Less
. This problem is now fixed.To verify, I benchmarked this simple program:
Before this PR, it would take 0.95 sec, and now it takes 0.58 sec.
I also tried changing the
is_less
lambda to usecmp
andpartial_cmp
. Now all three versions (lt
,cmp
,partial_cmp
) are equally performant for sorting slices - all of them take 0.58 sec on thebenchmark.
Tangentially, as soon as we get
default impl
, it might be a good idea to implement a blanket default impl forlt
,gt
,le
,ge
in terms ofcmp
whenever possible. Today, those four functions by default are only implemented in terms ofpartial_cmp
.r? @alexcrichton