Skip to content

More ObligationForest improvements #64545

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 5 commits into from
Sep 19, 2019
Merged

Conversation

nnethercote
Copy link
Contributor

Following on from #64500, these commits alsomake the code both nicer and faster.

r? @nikomatsakis

`Node` has an optional parent and a list of other descendents. Most of
the time the parent is treated the same as the other descendents --
error-handling is the exception -- and chaining them together for
iteration has a non-trivial cost.

This commit changes the representation. There is now a single list of
descendants, and a boolean flag that indicates if there is a parent (in
which case it is first descendent). This representation encodes the same
information, in a way that is less idiomatic but cheaper to iterate over
for the common case where the parent doesn't need special treatment.

As a result, some benchmark workloads are up to 2% faster.
The size of the indices doesn't matter much here, and having a
`newtype_index!` index type without also using `IndexVec` requires lots
of conversions. So this commit removes `NodeIndex` in favour of uniform
use of `usize` as the index type. As well as making the code slightly
more concise, it also slightly speeds things up.
Now that all indices have type `usize`, it makes sense to be more
consistent about their naming. This commit removes all uses of `i` in
favour of `index`.
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 17, 2019
@nnethercote
Copy link
Contributor Author

Some local measurements:

inflate-check
        avg: -10.3%     min: -14.1%     max: -0.0%
keccak-check
        avg: -7.6%      min: -11.7%     max: -0.0%
cranelift-codegen-check
        avg: -1.3%      min: -2.2%      max: -0.0%
clap-rs-check
        avg: -0.7%      min: -1.7%      max: 0.0%
serde-check
        avg: -0.7%      min: -1.2%      max: -0.0%
deeply-nested-check
        avg: -0.5%      min: -0.8%      max: 0.0%
unify-linearly-check
        avg: -0.3%      min: -0.6%      max: 0.0%
style-servo-check
        avg: -0.4%      min: -0.6%      max: 0.0%
wg-grammar-check
        avg: -0.3%      min: -0.5%      max: -0.0%

Let's see how a CI perf run compares.

@bors try

@bors
Copy link
Collaborator

bors commented Sep 17, 2019

⌛ Trying commit b261edf with merge 2ae267a...

bors added a commit that referenced this pull request Sep 17, 2019
More `ObligationForest` improvements

Following on from #64500, these commits alsomake the code both nicer and faster.

r? @nikomatsakis
@bors
Copy link
Collaborator

bors commented Sep 17, 2019

☀️ Try build successful - checks-azure
Build commit: 2ae267a

@Mark-Simulacrum
Copy link
Member

@rust-timer build 2ae267a

@rust-timer
Copy link
Collaborator

Queued 2ae267a with parent 5670d04, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 2ae267a, comparison URL.

@nnethercote
Copy link
Contributor Author

Perf results on CI are similar to what I saw locally: big wins on keccak and inflate, smaller wins elsewhere.

I think the lesson from this PR (especially commits 1, 4 and 5) is "don't rely on the compiler to fully optimize higher-level constructs in the hottest code".

nnethercote added a commit to nnethercote/rust that referenced this pull request Sep 18, 2019
PR rust-lang#64545 got a big speed-up by replacing a hot call to `all()` with
explicit iteration. This is because the implementation of `all()` is
excessively complex: it wraps the given predicate in a closure that
returns a `LoopState`, passes that closure to `try_for_each()`, which
wraps the first closure in a second closure, passes that second closure
to `try_fold()`, which does the actual iteration using the second
closure.

A sufficient smart compiler could optimize all this away; rustc is
currently not sufficiently smart.

This commit does the following.

- Changes the implementations of `all()`, `any()`, `find()` and
  `find_map()` to use the simplest possible code, rather than using
  `try_for_each()`. (I am reminded of "The Evolution of a Haskell
  Programmer".) These are both shorter and faster than the current
  implementations, and will permit the undoing of the `all()` removal in
  rust-lang#64545.

- Changes `ResultShunt::next()` so it doesn't call `self.find()`,
  because that was causing infinite recursion with the new
  implementation of `find()`, which itself calls `self.next()`. (I
  honestly don't know how the old implementation of
  `ResultShunt::next()` didn't cause an infinite loop, given that it
  also called `self.next()`, albeit via `try_for_each()` and
  `try_fold()`.)

- Changes `nth()` to use `self.next()` in a while loop rather than `for
  x in self`, because using self-iteration within an iterator method
  seems dubious, and `self.next()` is used in all the other iterator
  methods.
bors added a commit that referenced this pull request Sep 18, 2019
Simplify some `Iterator` methods.

PR #64545 got a big speed-up by replacing a hot call to `all()` with
explicit iteration. This is because the implementation of `all()` is
excessively complex: it wraps the given predicate in a closure that
returns a `LoopState`, passes that closure to `try_for_each()`, which
wraps the first closure in a second closure, passes that second closure
to `try_fold()`, which does the actual iteration using the second
closure.

A sufficient smart compiler could optimize all this away; rustc is
currently not sufficiently smart.

This commit does the following.

- Changes the implementations of `all()`, `any()`, `find()` and
  `find_map()` to use the simplest possible code, rather than using
  `try_for_each()`. (I am reminded of "The Evolution of a Haskell
  Programmer".) These are both shorter and faster than the current
  implementations, and will permit the undoing of the `all()` removal in
  #64545.

- Changes `ResultShunt::next()` so it doesn't call `self.find()`,
  because that was causing infinite recursion with the new
  implementation of `find()`, which itself calls `self.next()`. (I
  honestly don't know how the old implementation of
  `ResultShunt::next()` didn't cause an infinite loop, given that it
  also called `self.next()`, albeit via `try_for_each()` and
  `try_fold()`.)

- Changes `nth()` to use `self.next()` in a while loop rather than `for
  x in self`, because using self-iteration within an iterator method
  seems dubious, and `self.next()` is used in all the other iterator
  methods.
@nikomatsakis
Copy link
Contributor

r=me with comment, thanks @nnethercote!

@Centril Centril added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 18, 2019
Amazingly enough, this is a 3.5% instruction count win on `keccak`.
The super-hot call site of `inlined_shallow_resolve()` basically does
`r.inlined_shallow_resolve(ty) != ty`. This commit introduces a
version of that function specialized for that particular call pattern,
`shallow_resolve_changed()`. Incredibly, this reduces the instruction
count for `keccak` by 5%.

The commit also renames `inlined_shallow_resolve()` as
`shallow_resolve()` and removes the `inline(always)` annotation, as it's
no longer nearly so hot.
@nnethercote
Copy link
Contributor Author

@bors r=nmatsakis

@bors
Copy link
Collaborator

bors commented Sep 18, 2019

📌 Commit 3b85597 has been approved by nmatsakis

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 18, 2019
@nnethercote
Copy link
Contributor Author

Because this has a perf impact, let's do:
@bors rollup-never

@Mark-Simulacrum
Copy link
Member

Since we've already run perf there's probably no reason for that? command is rollup=never btw

@nnethercote
Copy link
Contributor Author

My preference is for perf-affecting PRs to land on their own. If you look at perf.rust-lang.org and see an improvement or regression, and find it's a roll-up, it can be a real pain to work out which PR caused the change.

@bors rollup=never

@nnethercote nnethercote reopened this Sep 18, 2019
@bors
Copy link
Collaborator

bors commented Sep 19, 2019

⌛ Testing commit 3b85597 with merge 9b9d2af...

bors added a commit that referenced this pull request Sep 19, 2019
More `ObligationForest` improvements

Following on from #64500, these commits alsomake the code both nicer and faster.

r? @nikomatsakis
@bors
Copy link
Collaborator

bors commented Sep 19, 2019

☀️ Test successful - checks-azure
Approved by: nmatsakis
Pushing 9b9d2af to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 19, 2019
@bors bors merged commit 3b85597 into rust-lang:master Sep 19, 2019
@nnethercote nnethercote deleted the ObligForest-more branch September 20, 2019 02:55
bors added a commit that referenced this pull request Sep 20, 2019
Even more `ObligationForest` improvements

Following on from #64545, more speed and readability improvements.

r? @nikomatsakis
bors added a commit that referenced this pull request Sep 20, 2019
Even more `ObligationForest` improvements

Following on from #64545, more speed and readability improvements.

r? @nikomatsakis
scottmcm added a commit to scottmcm/rust that referenced this pull request Sep 22, 2019
While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like rust-lang#64545).
bors added a commit that referenced this pull request Sep 24, 2019
Remove manual unrolling from slice::Iter(Mut)::try_fold

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545).

---

For context see #64572 (comment)
bors added a commit that referenced this pull request Sep 25, 2019
Even more `ObligationForest` improvements

Following on from #64545, more speed and readability improvements.

r? @nikomatsakis
bors added a commit that referenced this pull request Sep 30, 2019
Remove manual unrolling from slice::Iter(Mut)::try_fold

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone.

I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545).

---

For context see #64572 (comment)
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants