-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Ironing out StepBy<Range>'s performance issues #51557
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Overall yes; someone should make a PR 🙂 Probably best to use specialization for now to hide the detail. (I suppose a doc-hidden unstable method on Iterator could work too, but that feels wrong.) A few minor things:
It would still be needed for non-
It's hard to guess at exactly how it should be written without looking at how LLVM handles it. Something like that seems plausible, but I could imagine seemingly-unimportant things like the branch orderings affecting the loop passes. Maybe it would be possible to have a codegen test to make sure? |
I've copied the std implementations onto godbolt and added the specialization. As expected, it improves the generated code a lot. Adding this specializion slightly changes what we're expecting from the Is there any way we can get rid of the TryFrom baggage? |
I haven't looked at it in detail yet, but reminder that this isn't exactly what we want to reach: // what we want to reach
pub fn manual_while() {
let mut n = 0;
while n < UPPER {
test::black_box(n);
n += STEP;
}
} Because that doesn't have overflow detection, and is thus an infinite loop for something like |
Good catch. That's also what caused the difference between manual and specialized for u8. |
Another issue I've found is that with
|
Specialize StepBy<Range(Inclusive)> Part of #51557, related to #43064, #31155 As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to 1. the special casing of the first `StepBy::next()` call 2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()` This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm. Unsigned integers optimize better than signed ones for some reason. See the following two links for a comparison. [godbolt: specialization for ..](https://godbolt.org/g/haHLJr) [godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6) `RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness". The approach can not be used in general, because it would produce side effects of the underlying iterator too early. May obsolete #51435, haven't checked.
#111850 solved this for unsigned integers. Is that sufficient? |
The behaviour of
<Range<_> as Iterator>::nth
has a slight mismatch withStepBy
(orStep
depending on your viewpoint) as @scottmcm has found out, resulting in sub-optimal performance.On every iteration, the range has to first step forwards
n-1
times to get the next element and then advance again by 1.I'm hoping we can improve
step_by
into a 100% zero-cost abstraction.It seems like the performance issue is specific to
StepBy<Range>
. I'm thinking therefore that we could specializeIterator for StepBy<Range<I>>
such that it would use @scottmcm's suggested semantics. Like this:That also avoids the branch on a regular
next()
. I haven't looked at the other methods but that boolean inStepBy
could possibly become superfluous. During construction of theStepBy
adapter, thesize
in.step_by(size)
is decremented and this specialization has to counter-add 1 every time but that should be optimized away if inlined.If someone were to depend on side-effects in
Step::add_usize
(when the trait is stabilized), this pre-stepping would become weird. Same thing with a hypotheticalnext_and_skip_ahead()
.@scottmcm what do you think of this?
The text was updated successfully, but these errors were encountered: