Skip to content

Specialize PartialOrd<A> for [A] where A: Ord #39642

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 3 commits into from Feb 11, 2017
Merged

Specialize PartialOrd<A> for [A] where A: Ord #39642

merged 3 commits into from Feb 11, 2017

Conversation

ghost
Copy link

@ghost ghost commented Feb 8, 2017

This way we can call cmp instead of partial_cmp in the loop, removing some burden of optimizing Options away from the compiler.

PR #39538 introduced a regression where sorting slices suddenly became slower, since slice1.lt(slice2) was much slower than slice1.cmp(slice2) == Less. This problem is now fixed.

To verify, I benchmarked this simple program:

fn main() {
    let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>();
    v.sort();
}

Before this PR, it would take 0.95 sec, and now it takes 0.58 sec.
I also tried changing the is_less lambda to use cmp and partial_cmp. Now all three versions (lt, cmp, partial_cmp) are equally performant for sorting slices - all of them take 0.58 sec on the
benchmark.

Tangentially, as soon as we get default impl, it might be a good idea to implement a blanket default impl for lt, gt, le, ge in terms of cmp whenever possible. Today, those four functions by default are only implemented in terms of partial_cmp.

r? @alexcrichton

This way we can call `cmp` instead of `partial_cmp` in the loop,
removing some burden of optimizing `Option`s away from the compiler.

PR #39538 introduced a regression where sorting slices suddenly became
slower, since `slice1.lt(slice2)` was much slower than
`slice1.cmp(slice2) == Less`. This problem is now fixed.

To verify, I benchmarked this simple program:
```rust
fn main() {
    let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>();
    v.sort();
}
```

Before this PR, it would take 0.95 sec, and now it takes 0.58 sec.
I also tried changing the `is_less` lambda to use `cmp` and
`partial_cmp`. Now all three versions (`lt`, `cmp`, `partial_cmp`) are
equally performant for sorting slices - all of them take 0.58 sec on the
benchmark.
self.len().partial_cmp(&other.len())
}
}

impl SlicePartialOrd<u8> for [u8] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this impl not be extended to all A: Ord so it would simply use Some(SliceOrd::compare(self, other))?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion, thanks! I've updated the code.

Some(SliceOrd::compare(self, other))
}
}

impl SlicePartialOrd<u8> for [u8] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this specialization could be removed now, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, removed.

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Feb 8, 2017
@alexcrichton
Copy link
Member

Looks good to me, thanks @stjepang!

cc @rust-lang/libs, @bluss, I'm sure this has the likelihood of being super subtle, so would be good to get some more eyes on this as well

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Collaborator

bors commented Feb 10, 2017

📌 Commit a344c12 has been approved by alexcrichton

@bors
Copy link
Collaborator

bors commented Feb 11, 2017

⌛ Testing commit a344c12 with merge f140a6c...

bors added a commit that referenced this pull request Feb 11, 2017
…ichton

Specialize `PartialOrd<A> for [A] where A: Ord`

This way we can call `cmp` instead of `partial_cmp` in the loop, removing some burden of optimizing `Option`s away from the compiler.

PR #39538 introduced a regression where sorting slices suddenly became slower, since `slice1.lt(slice2)` was much slower than `slice1.cmp(slice2) == Less`. This problem is now fixed.

To verify, I benchmarked this simple program:
```rust
fn main() {
    let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>();
    v.sort();
}
```

Before this PR, it would take 0.95 sec, and now it takes 0.58 sec.
I also tried changing the `is_less` lambda to use `cmp` and `partial_cmp`. Now all three versions (`lt`, `cmp`, `partial_cmp`) are equally performant for sorting slices - all of them take 0.58 sec on the
benchmark.

Tangentially, as soon as we get `default impl`, it might be a good idea to implement a blanket default impl for `lt`, `gt`, `le`, `ge` in terms of `cmp` whenever possible. Today, those four functions by default are only implemented in terms of `partial_cmp`.

r? @alexcrichton
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 11, 2017
…d, r=alexcrichton

Specialize `PartialOrd<A> for [A] where A: Ord`

This way we can call `cmp` instead of `partial_cmp` in the loop, removing some burden of optimizing `Option`s away from the compiler.

PR rust-lang#39538 introduced a regression where sorting slices suddenly became slower, since `slice1.lt(slice2)` was much slower than `slice1.cmp(slice2) == Less`. This problem is now fixed.

To verify, I benchmarked this simple program:
```rust
fn main() {
    let mut v = (0..2_000_000).map(|x| x * x * x * 18913515181).map(|x| vec![x, x ^ 3137831591]).collect::<Vec<_>>();
    v.sort();
}
```

Before this PR, it would take 0.95 sec, and now it takes 0.58 sec.
I also tried changing the `is_less` lambda to use `cmp` and `partial_cmp`. Now all three versions (`lt`, `cmp`, `partial_cmp`) are equally performant for sorting slices - all of them take 0.58 sec on the
benchmark.

Tangentially, as soon as we get `default impl`, it might be a good idea to implement a blanket default impl for `lt`, `gt`, `le`, `ge` in terms of `cmp` whenever possible. Today, those four functions by default are only implemented in terms of `partial_cmp`.

r? @alexcrichton
@bors
Copy link
Collaborator

bors commented Feb 11, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing f140a6c to master...

@bors bors merged commit a344c12 into rust-lang:master Feb 11, 2017
@ghost ghost deleted the specialize-slice-partialord branch February 11, 2017 09:53
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants