-
Notifications
You must be signed in to change notification settings - Fork 13.3k
speed up String::push
and String::insert
#124810
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
speed up String::push
and String::insert
#124810
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a variety of thoughts; let me know what you think.
Also, is there anything here for which it would make sense to have a codegen test to confirm what's happening? Or some other test to help confirm it's better?
A codegen check for the absence of |
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged. You can start a rebase with the following commands:
The following commits are merge commits: |
9511918
to
89fa55e
Compare
This comment was marked as outdated.
This comment was marked as outdated.
89fa55e
to
2cb20b3
Compare
This comment was marked as outdated.
This comment was marked as outdated.
The proposed implementation uses |
Finished benchmarking commit (26f0bba): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 3.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 2.4%, secondary -2.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 774.477s -> 773.266s (-0.16%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this has been sitting for so long, I have one last question then I think we can merge this. Mind posting the results of library/alloctests/benches/string.rs if you have run those?
#[doc(hidden)] | ||
#[inline] | ||
#[cfg_attr(bootstrap, rustc_allow_const_fn_unstable(const_mut_refs))] | ||
pub const unsafe fn encode_utf8_raw_unchecked(code: u32, dst: *mut u8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been long enough that I'm forgetting context here, but why was this changed away from MaybeUninit
? Specifically thinking of a signature like
pub const unsafe fn encode_utf8_raw_unchecked(
code: u32, dst: &mut [MaybeUninit<u8>]
) -> &mut [u8] {
// Write the characters then call MaybeUninit::assume_init_ref
}
Then lengths get checked and push
becomes slightly simpler with core::char::encode_utf8_raw_unchecked(ch as u32, self.vec.spare_capacity_mut())
(maybe needs an assert_unchecked(self.buf.capacity() - self.len > len)
if LLVM doesn't pick up on that).
The benchmarks for Original benchmark resultsbefore the patch:
after the patch:
If we make the design of the The improved benchmark code#[bench]
fn bench_push_char_two_bytes(b: &mut Bencher) {
b.bytes = REPETITIONS * 2;
b.iter(|| {
let mut r = String::new();
for _ in 0..REPETITIONS {
black_box(&mut r).push(black_box('â'));
}
r
});
} before the patch:
after the patch:
I can make an additional commit that improves the benchmark. |
In the message above you can observe a regression in without the patch: 12745.60, 12654.10, 12654.79, 12652.53, 12669.58, 12751.50, 12656.31, 12654.54, 12652.73, 12650.33 (average 12674.2 ns) Both give the same average so we can say that the performance doesn't change for a single-byte push. I also tried to add a For a two-byte push, though, the average time goes from 49850ns to 14785ns, giving a 3.37x speedup. |
The |
9cf92e5
to
b6cf666
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that, the GH UI got me. I missed the const
reasoning but that makes sense, so the change LGTM.
Mind squashing the first two commits since the codegen change happens with the implementation change? r=me with that
Improve performance of `String` methods by avoiding unnecessary memcpy for the character bytes, with added codegen check to ensure compliance.
b6cf666
to
ff248de
Compare
@bors r=tgross35 |
@lincot: 🔑 Insufficient privileges: Not in reviewers |
@bors r+ |
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 48f89e7 (parent) -> 934880f (this PR) Test differencesShow 170 test diffsStage 1
Stage 2
Additionally, 168 doctest diffs were found. These are ignored, as they are noisy. Job group index
Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
Finished benchmarking commit (934880f): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -3.0%, secondary 3.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 780.203s -> 779.708s (-0.06%) |
Addresses the concerns described in #116235.
The performance gain comes mainly from avoiding temporary buffers.
Complex pattern matching in
encode_utf8
(introduced in #67569) has been simplified to a comparison and an exhaustivematch
in theencode_utf8_raw_unchecked
helper function. It takes a slice ofMaybeUninit<u8>
because otherwise we'd have to construct a normal slice to uninitialized data, which is not desirable, I guess.Several functions still have that unneeded zeroing, but a single instruction is not that important, I guess.
@rustbot label T-libs C-optimization A-str