-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Optimize array::IntoIter
#100214
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Optimize array::IntoIter
#100214
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
r? @thomcc (rust-highfive has picked a reviewer for you, use r? to override) |
This regresses the When looking into this I noticed that |
Thanks, @timvermeulen, I'll look at that one. I've sent PR #100220 to fix @rustbot author Update 2022-08-24: I've rebased atop the other PR, but still not investigated that perf test, so it's not really review-ready yet. |
…riplett Properly forward `ByRefSized::fold` to the inner iterator cc `@timvermeulen,` who noticed this mistake in rust-lang#100214 (comment)
…riplett Properly forward `ByRefSized::fold` to the inner iterator cc ``@timvermeulen,`` who noticed this mistake in rust-lang#100214 (comment)
…riplett Properly forward `ByRefSized::fold` to the inner iterator cc ```@timvermeulen,``` who noticed this mistake in rust-lang#100214 (comment)
…riplett Properly forward `ByRefSized::fold` to the inner iterator cc ``@timvermeulen,`` who noticed this mistake in rust-lang#100214 (comment)
84f0eaf
to
14e0e7b
Compare
Properly forward `ByRefSized::fold` to the inner iterator cc ``@timvermeulen,`` who noticed this mistake in rust-lang/rust#100214 (comment)
14e0e7b
to
c08ec39
Compare
This comment has been minimized.
This comment has been minimized.
4104932
to
c43b960
Compare
@timvermeulen My mistake turned out to be an obvious one: I'd forgotten to inline some methods, and because it's a non-generic type that meant the benches ended up needing to I now get exactly the same inner loop for that .LBB0_5:
mov edx, dword ptr [rbx + rcx]
rol edx, 8
bswap edx
mov dword ptr [rax + rcx], edx
add rcx, 4
cmp rdi, rcx
jne .LBB0_5 @rustbot ready |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
@rustbot author |
`.into_iter()` on arrays was slower than it needed to be (especially compared to slice iterator) since it uses `Range<usize>`, which needs to handle degenerate ranges like `10..4`. This PR adds an internal `IndexRange` type that's like `Range<usize>` but with a safety invariant that means it doesn't need to worry about those cases -- it only handles `start <= end` -- and thus can give LLVM more information to optimize better. I added one simple demonstration of the improvement as a codegen test.
fb3a7d8
to
6dbd9a2
Compare
Oh, it's failing on the UB trap check. I've ignored the codegen test for debug, since it's a @bors r=thomcc |
⌛ Testing commit 6dbd9a2 with merge da70c0543ba1f77142453c5c82e0a1c1e4e327d3... |
⌛ Testing commit 6dbd9a2 with merge ebbb1c6a0f71f7a95483d2fef3bed8c804b21cbe... |
💔 Test failed - checks-actions |
Looks like the retry didn't put it back in the queue right? I'll try it again. @bors retry |
☀️ Test successful - checks-actions |
Finished benchmarking commit (4ecfdfa): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
Huh, interesting that the post-merge perf run is so different (thankfully better!) from the original: #100214 (comment) |
.into_iter()
on arrays was slower than it needed to be (especially compared to slice iterator) since it usesRange<usize>
, which needs to handle degenerate ranges like10..4
.This PR adds an internal
IndexRange
type that's likeRange<usize>
but with a safety invariant that means it doesn't need to worry about those cases -- it only handlesstart <= end
-- and thus can give LLVM more information to optimize better.I added one simple demonstration of the improvement as a codegen test.
(
vec::IntoIter
uses pointers instead of indexes, so doesn't have this problem, but that only works because its elements are boxed.array::IntoIter
can't use pointers because that would keep it from being movable.)