-
Notifications
You must be signed in to change notification settings - Fork 13.4k
More ObligationForest
improvements
#64545
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
`Node` has an optional parent and a list of other descendents. Most of the time the parent is treated the same as the other descendents -- error-handling is the exception -- and chaining them together for iteration has a non-trivial cost. This commit changes the representation. There is now a single list of descendants, and a boolean flag that indicates if there is a parent (in which case it is first descendent). This representation encodes the same information, in a way that is less idiomatic but cheaper to iterate over for the common case where the parent doesn't need special treatment. As a result, some benchmark workloads are up to 2% faster.
The size of the indices doesn't matter much here, and having a `newtype_index!` index type without also using `IndexVec` requires lots of conversions. So this commit removes `NodeIndex` in favour of uniform use of `usize` as the index type. As well as making the code slightly more concise, it also slightly speeds things up.
Now that all indices have type `usize`, it makes sense to be more consistent about their naming. This commit removes all uses of `i` in favour of `index`.
Some local measurements:
Let's see how a CI perf run compares. @bors try |
More `ObligationForest` improvements Following on from #64500, these commits alsomake the code both nicer and faster. r? @nikomatsakis
☀️ Try build successful - checks-azure |
@rust-timer build 2ae267a |
Queued 2ae267a with parent 5670d04, future comparison URL. |
Finished benchmarking try commit 2ae267a, comparison URL. |
Perf results on CI are similar to what I saw locally: big wins on I think the lesson from this PR (especially commits 1, 4 and 5) is "don't rely on the compiler to fully optimize higher-level constructs in the hottest code". |
PR rust-lang#64545 got a big speed-up by replacing a hot call to `all()` with explicit iteration. This is because the implementation of `all()` is excessively complex: it wraps the given predicate in a closure that returns a `LoopState`, passes that closure to `try_for_each()`, which wraps the first closure in a second closure, passes that second closure to `try_fold()`, which does the actual iteration using the second closure. A sufficient smart compiler could optimize all this away; rustc is currently not sufficiently smart. This commit does the following. - Changes the implementations of `all()`, `any()`, `find()` and `find_map()` to use the simplest possible code, rather than using `try_for_each()`. (I am reminded of "The Evolution of a Haskell Programmer".) These are both shorter and faster than the current implementations, and will permit the undoing of the `all()` removal in rust-lang#64545. - Changes `ResultShunt::next()` so it doesn't call `self.find()`, because that was causing infinite recursion with the new implementation of `find()`, which itself calls `self.next()`. (I honestly don't know how the old implementation of `ResultShunt::next()` didn't cause an infinite loop, given that it also called `self.next()`, albeit via `try_for_each()` and `try_fold()`.) - Changes `nth()` to use `self.next()` in a while loop rather than `for x in self`, because using self-iteration within an iterator method seems dubious, and `self.next()` is used in all the other iterator methods.
Simplify some `Iterator` methods. PR #64545 got a big speed-up by replacing a hot call to `all()` with explicit iteration. This is because the implementation of `all()` is excessively complex: it wraps the given predicate in a closure that returns a `LoopState`, passes that closure to `try_for_each()`, which wraps the first closure in a second closure, passes that second closure to `try_fold()`, which does the actual iteration using the second closure. A sufficient smart compiler could optimize all this away; rustc is currently not sufficiently smart. This commit does the following. - Changes the implementations of `all()`, `any()`, `find()` and `find_map()` to use the simplest possible code, rather than using `try_for_each()`. (I am reminded of "The Evolution of a Haskell Programmer".) These are both shorter and faster than the current implementations, and will permit the undoing of the `all()` removal in #64545. - Changes `ResultShunt::next()` so it doesn't call `self.find()`, because that was causing infinite recursion with the new implementation of `find()`, which itself calls `self.next()`. (I honestly don't know how the old implementation of `ResultShunt::next()` didn't cause an infinite loop, given that it also called `self.next()`, albeit via `try_for_each()` and `try_fold()`.) - Changes `nth()` to use `self.next()` in a while loop rather than `for x in self`, because using self-iteration within an iterator method seems dubious, and `self.next()` is used in all the other iterator methods.
r=me with comment, thanks @nnethercote! |
Amazingly enough, this is a 3.5% instruction count win on `keccak`.
The super-hot call site of `inlined_shallow_resolve()` basically does `r.inlined_shallow_resolve(ty) != ty`. This commit introduces a version of that function specialized for that particular call pattern, `shallow_resolve_changed()`. Incredibly, this reduces the instruction count for `keccak` by 5%. The commit also renames `inlined_shallow_resolve()` as `shallow_resolve()` and removes the `inline(always)` annotation, as it's no longer nearly so hot.
b261edf
to
3b85597
Compare
@bors r=nmatsakis |
📌 Commit 3b85597 has been approved by |
Because this has a perf impact, let's do: |
Since we've already run perf there's probably no reason for that? command is rollup=never btw |
My preference is for perf-affecting PRs to land on their own. If you look at perf.rust-lang.org and see an improvement or regression, and find it's a roll-up, it can be a real pain to work out which PR caused the change. @bors rollup=never |
More `ObligationForest` improvements Following on from #64500, these commits alsomake the code both nicer and faster. r? @nikomatsakis
☀️ Test successful - checks-azure |
Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis
Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis
While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone. I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like rust-lang#64545).
Remove manual unrolling from slice::Iter(Mut)::try_fold While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone. I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545). --- For context see #64572 (comment)
Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis
Remove manual unrolling from slice::Iter(Mut)::try_fold While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone. I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545). --- For context see #64572 (comment)
Following on from #64500, these commits alsomake the code both nicer and faster.
r? @nikomatsakis