-
Notifications
You must be signed in to change notification settings - Fork 13.3k
StepBy<_, Range<_>> optimises poorly #31155
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Here's my ideal step by: https://play.rust-lang.org/?gist=f967691b33eef2f2de70&version=stable For example, the boundary checks in the loop are eliminated. The idea is simply to do exactly what a for loop in C would do. |
I'm guessing this is very likely a case of the |
It looks like it might be possible to get rid of |
I think we decided the user has to ensure that they don't take too many elements from a |
i.e. behavior past the overflow point is not part of the interface. |
@bluss hm, while it's nice to have something to compare against for the simple cases, your ideal version is unfortunately somewhat useless: we obviously want to do what a C |
Maybe useless to replace current step_by, yes. It's a tall order, isn't it? Do what the for loop does, Zero overhead, Check for overflow. Computing the end up front, it sounds doable though. For the |
I think that making It seems to me that it makes most sense for let mut count = 0;
some_iter.filter_map(|x| { if count == 0 { count = n - 1; Some(x) } else { count -= 1; None }) (So, yes, some scheme that computes the appropriate upper bound seems like it may be the best plan of attack.) |
I was against having But I don't agree that it has infinite length. It has a debug assertion for overflow. |
Yes, the Debug assertions makes the "real" behaviour a type a little bit of a grey area, but I think one certainly couldn't say that an overflowing version of that iterator is finite: no matter what build configuration you have, you'll never get a |
Very fair point. The nice behavior is certainly preferable. When people pull out a while loop on the forum and discover "this has better performance than step_by", it's because they completely disregard the overflow (wraparound) case, though. RangeFrom is neither finite nor infinite, but it ends with a bang 😉 |
Introducing new footguns definitely doesn't sound like a desirable thing. It seems to me that the current behavior (ie. overflow checking) is the most desirable behavior for the general use of the That I think it's worth asking whether we can—and whether it's worth—creating an optimized path for
|
Today's reproduction still doesn't optimize |
Specialize StepBy<Range(Inclusive)> Part of #51557, related to #43064, #31155 As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to 1. the special casing of the first `StepBy::next()` call 2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()` This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm. Unsigned integers optimize better than signed ones for some reason. See the following two links for a comparison. [godbolt: specialization for ..](https://godbolt.org/g/haHLJr) [godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6) `RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness". The approach can not be used in general, because it would produce side effects of the underlying iterator too early. May obsolete #51435, haven't checked.
Today, the feature is no longer in nightly, and we get fully optimized results for both:
closing! |
There's a lot going on inside
StepBy<_, Range<_>>
'sIterator
implementation and LLVM does a reasonable job of cutting things down, but doesn't get all the way (definition inlined for context if it changes in future, and to allow easy experimentation):Optimised asm (it'd be great for the first to be like the second):
https://play.rust-lang.org/?gist=a926869a4cf59d6683c4
#24660 previously had a somewhat similar problem, although this one is compounded by using
checked_add
implemented in terms of LLVM's overflow intrinsics, which the LLVM performance tips explicitly recommend against:The text was updated successfully, but these errors were encountered: