More `ObligationForest` improvements #64545

nnethercote · 2019-09-17T11:48:25Z

Following on from #64500, these commits alsomake the code both nicer and faster.

`Node` has an optional parent and a list of other descendents. Most of the time the parent is treated the same as the other descendents -- error-handling is the exception -- and chaining them together for iteration has a non-trivial cost. This commit changes the representation. There is now a single list of descendants, and a boolean flag that indicates if there is a parent (in which case it is first descendent). This representation encodes the same information, in a way that is less idiomatic but cheaper to iterate over for the common case where the parent doesn't need special treatment. As a result, some benchmark workloads are up to 2% faster.

The size of the indices doesn't matter much here, and having a `newtype_index!` index type without also using `IndexVec` requires lots of conversions. So this commit removes `NodeIndex` in favour of uniform use of `usize` as the index type. As well as making the code slightly more concise, it also slightly speeds things up.

Now that all indices have type `usize`, it makes sense to be more consistent about their naming. This commit removes all uses of `i` in favour of `index`.

nnethercote · 2019-09-17T11:49:49Z

Some local measurements:

inflate-check
        avg: -10.3%     min: -14.1%     max: -0.0%
keccak-check
        avg: -7.6%      min: -11.7%     max: -0.0%
cranelift-codegen-check
        avg: -1.3%      min: -2.2%      max: -0.0%
clap-rs-check
        avg: -0.7%      min: -1.7%      max: 0.0%
serde-check
        avg: -0.7%      min: -1.2%      max: -0.0%
deeply-nested-check
        avg: -0.5%      min: -0.8%      max: 0.0%
unify-linearly-check
        avg: -0.3%      min: -0.6%      max: 0.0%
style-servo-check
        avg: -0.4%      min: -0.6%      max: 0.0%
wg-grammar-check
        avg: -0.3%      min: -0.5%      max: -0.0%

Let's see how a CI perf run compares.

@bors try

bors · 2019-09-17T11:50:02Z

⌛ Trying commit b261edf with merge 2ae267a...

@nikomatsakis

More `ObligationForest` improvements Following on from #64500, these commits alsomake the code both nicer and faster. r? @nikomatsakis

bors · 2019-09-17T14:39:45Z

☀️ Try build successful - checks-azure
Build commit: 2ae267a

Mark-Simulacrum · 2019-09-17T20:45:47Z

@rust-timer build 2ae267a

rust-timer · 2019-09-17T20:45:48Z

Queued 2ae267a with parent 5670d04, future comparison URL.

rust-timer · 2019-09-17T23:02:13Z

Finished benchmarking try commit 2ae267a, comparison URL.

nnethercote · 2019-09-17T23:44:29Z

Perf results on CI are similar to what I saw locally: big wins on keccak and inflate, smaller wins elsewhere.

I think the lesson from this PR (especially commits 1, 4 and 5) is "don't rely on the compiler to fully optimize higher-level constructs in the hottest code".

PR rust-lang#64545 got a big speed-up by replacing a hot call to `all()` with explicit iteration. This is because the implementation of `all()` is excessively complex: it wraps the given predicate in a closure that returns a `LoopState`, passes that closure to `try_for_each()`, which wraps the first closure in a second closure, passes that second closure to `try_fold()`, which does the actual iteration using the second closure. A sufficient smart compiler could optimize all this away; rustc is currently not sufficiently smart. This commit does the following. - Changes the implementations of `all()`, `any()`, `find()` and `find_map()` to use the simplest possible code, rather than using `try_for_each()`. (I am reminded of "The Evolution of a Haskell Programmer".) These are both shorter and faster than the current implementations, and will permit the undoing of the `all()` removal in rust-lang#64545. - Changes `ResultShunt::next()` so it doesn't call `self.find()`, because that was causing infinite recursion with the new implementation of `find()`, which itself calls `self.next()`. (I honestly don't know how the old implementation of `ResultShunt::next()` didn't cause an infinite loop, given that it also called `self.next()`, albeit via `try_for_each()` and `try_fold()`.) - Changes `nth()` to use `self.next()` in a while loop rather than `for x in self`, because using self-iteration within an iterator method seems dubious, and `self.next()` is used in all the other iterator methods.

Simplify some `Iterator` methods. PR #64545 got a big speed-up by replacing a hot call to `all()` with explicit iteration. This is because the implementation of `all()` is excessively complex: it wraps the given predicate in a closure that returns a `LoopState`, passes that closure to `try_for_each()`, which wraps the first closure in a second closure, passes that second closure to `try_fold()`, which does the actual iteration using the second closure. A sufficient smart compiler could optimize all this away; rustc is currently not sufficiently smart. This commit does the following. - Changes the implementations of `all()`, `any()`, `find()` and `find_map()` to use the simplest possible code, rather than using `try_for_each()`. (I am reminded of "The Evolution of a Haskell Programmer".) These are both shorter and faster than the current implementations, and will permit the undoing of the `all()` removal in #64545. - Changes `ResultShunt::next()` so it doesn't call `self.find()`, because that was causing infinite recursion with the new implementation of `find()`, which itself calls `self.next()`. (I honestly don't know how the old implementation of `ResultShunt::next()` didn't cause an infinite loop, given that it also called `self.next()`, albeit via `try_for_each()` and `try_fold()`.) - Changes `nth()` to use `self.next()` in a while loop rather than `for x in self`, because using self-iteration within an iterator method seems dubious, and `self.next()` is used in all the other iterator methods.

src/librustc/traits/fulfill.rs

nikomatsakis · 2019-09-18T19:04:06Z

r=me with comment, thanks @nnethercote!

Amazingly enough, this is a 3.5% instruction count win on `keccak`.

The super-hot call site of `inlined_shallow_resolve()` basically does `r.inlined_shallow_resolve(ty) != ty`. This commit introduces a version of that function specialized for that particular call pattern, `shallow_resolve_changed()`. Incredibly, this reduces the instruction count for `keccak` by 5%. The commit also renames `inlined_shallow_resolve()` as `shallow_resolve()` and removes the `inline(always)` annotation, as it's no longer nearly so hot.

nnethercote · 2019-09-18T20:39:57Z

@bors r=nmatsakis

bors · 2019-09-18T20:39:58Z

📌 Commit 3b85597 has been approved by nmatsakis

nnethercote · 2019-09-18T21:03:25Z

Because this has a perf impact, let's do:
@bors rollup-never

Mark-Simulacrum · 2019-09-18T21:07:01Z

Since we've already run perf there's probably no reason for that? command is rollup=never btw

nnethercote · 2019-09-18T22:55:44Z

My preference is for perf-affecting PRs to land on their own. If you look at perf.rust-lang.org and see an improvement or regression, and find it's a roll-up, it can be a real pain to work out which PR caused the change.

@bors rollup=never

bors · 2019-09-19T06:54:00Z

⌛ Testing commit 3b85597 with merge 9b9d2af...

@nikomatsakis

More `ObligationForest` improvements Following on from #64500, these commits alsomake the code both nicer and faster. r? @nikomatsakis

bors · 2019-09-19T10:44:56Z

☀️ Test successful - checks-azure
Approved by: nmatsakis
Pushing 9b9d2af to master...

@nikomatsakis

Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis

@nikomatsakis

Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis

While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone. I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like rust-lang#64545).

Remove manual unrolling from slice::Iter(Mut)::try_fold While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone. I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545). --- For context see #64572 (comment)

@nikomatsakis

Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis

Remove manual unrolling from slice::Iter(Mut)::try_fold While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone. I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545). --- For context see #64572 (comment)

nnethercote added 3 commits September 17, 2019 15:32

Rename some index variables.

7f6e160

Now that all indices have type `usize`, it makes sense to be more consistent about their naming. This commit removes all uses of `i` in favour of `index`.

rust-highfive assigned nikomatsakis Sep 17, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 17, 2019

bors added a commit that referenced this pull request Sep 17, 2019

Auto merge of #64545 - nnethercote:ObligForest-more, r=<try>

2ae267a

More `ObligationForest` improvements Following on from #64500, these commits alsomake the code both nicer and faster. r? @nikomatsakis

nnethercote mentioned this pull request Sep 18, 2019

Simplify some Iterator methods. #64572

Closed

nikomatsakis requested changes Sep 18, 2019

View reviewed changes

src/librustc/traits/fulfill.rs Show resolved Hide resolved

Centril added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 18, 2019

nnethercote added 2 commits September 19, 2019 06:35

Use explicit iteration instead of all() in process_obligation().

82e5c6e

Amazingly enough, this is a 3.5% instruction count win on `keccak`.

nnethercote force-pushed the ObligForest-more branch from b261edf to 3b85597 Compare September 18, 2019 20:36

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 18, 2019

nnethercote closed this Sep 18, 2019

nnethercote reopened this Sep 18, 2019

bors added a commit that referenced this pull request Sep 19, 2019

Auto merge of #64545 - nnethercote:ObligForest-more, r=nmatsakis

9b9d2af

More `ObligationForest` improvements Following on from #64500, these commits alsomake the code both nicer and faster. r? @nikomatsakis

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 19, 2019

bors merged commit 3b85597 into rust-lang:master Sep 19, 2019

nnethercote deleted the ObligForest-more branch September 20, 2019 02:55

nnethercote mentioned this pull request Sep 20, 2019

Even more ObligationForest improvements #64627

Merged

bors added a commit that referenced this pull request Sep 20, 2019

Auto merge of #64627 - nnethercote:ObligForest-even-more, r=<try>

281fcd7

Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis

bors added a commit that referenced this pull request Sep 20, 2019

Auto merge of #64627 - nnethercote:ObligForest-even-more, r=<try>

85a4a15

Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis

scottmcm mentioned this pull request Sep 22, 2019

Remove manual unrolling from slice::Iter(Mut)::try_fold #64600

Merged

bors added a commit that referenced this pull request Sep 25, 2019

Auto merge of #64627 - nnethercote:ObligForest-even-more, r=nikomatsakis

c7bc0bf

Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis

More ObligationForest improvements #64545

More ObligationForest improvements #64545

Uh oh!

Conversation

nnethercote commented Sep 17, 2019

Uh oh!

nnethercote commented Sep 17, 2019

Uh oh!

bors commented Sep 17, 2019

Uh oh!

bors commented Sep 17, 2019

Uh oh!

Mark-Simulacrum commented Sep 17, 2019

Uh oh!

rust-timer commented Sep 17, 2019

Uh oh!

rust-timer commented Sep 17, 2019

Uh oh!

nnethercote commented Sep 17, 2019

Uh oh!

Uh oh!

nikomatsakis commented Sep 18, 2019

Uh oh!

nnethercote commented Sep 18, 2019

Uh oh!

bors commented Sep 18, 2019

Uh oh!

nnethercote commented Sep 18, 2019

Uh oh!

Mark-Simulacrum commented Sep 18, 2019

Uh oh!

nnethercote commented Sep 18, 2019

Uh oh!

bors commented Sep 19, 2019

Uh oh!

bors commented Sep 19, 2019

Uh oh!

Uh oh!

More `ObligationForest` improvements #64545

More `ObligationForest` improvements #64545