Skip to content

Optimize Wtf8Buf::into_string for the case where it contains UTF-8. #96869

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 5 commits into from
Aug 24, 2022

Conversation

sunfishcode
Copy link
Member

Add a is_known_utf8 flag to Wtf8Buf, which tracks whether the
string is known to contain UTF-8. This is efficiently computed in many
common situations, such as when a Wtf8Buf is constructed from a String
or &str, or with Wtf8Buf::from_wide which is already doing UTF-16
decoding and already checking for surrogates.

This makes OsString::into_string O(1) rather than O(N) on Windows in
common cases.

And, it eliminates the need to scan through the string for surrogates in
Args::next and Vars::next, because the strings are already being
translated with Wtf8Buf::from_wide.

Many things on Windows construct OsStrings with Wtf8Buf::from_wide,
such as DirEntry::file_name and fs::read_link, so with this patch,
users of those functions can subsequently call .into_string() without
paying for an extra scan through the string for surrogates.

r? @ghost

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 9, 2022
@sunfishcode sunfishcode marked this pull request as ready for review May 10, 2022 14:28
@sunfishcode
Copy link
Member Author

r? rust-lang/libs

@joshtriplett
Copy link
Member

@bors r+ rollup=never

@bors
Copy link
Collaborator

bors commented May 17, 2022

📌 Commit f4c93bcbe837ab5220beaa2892fc249f49a9e740 has been approved by joshtriplett

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label May 17, 2022
@bors
Copy link
Collaborator

bors commented May 17, 2022

⌛ Testing commit f4c93bcbe837ab5220beaa2892fc249f49a9e740 with merge 7f13587a59ff3c5350b77bf9334994f91f05d9fc...

@bors
Copy link
Collaborator

bors commented May 17, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 17, 2022
@rust-log-analyzer

This comment has been minimized.

@sunfishcode
Copy link
Member Author

src/librustdoc/html/render/context.rs had an assert that rustdoc's render context was a certain size. It's already not checked on most platforms, as it's just a guard against the context growing unexpectedly, so I've now added a patch to disable the check on Windows too.

@bors
Copy link
Collaborator

bors commented May 28, 2022

☔ The latest upstream changes (presumably #97433) made this pull request unmergeable. Please resolve the merge conflicts.

@crlf0710
Copy link
Member

There's merge conflict now. Needs a rebase.

@crlf0710 crlf0710 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 28, 2022
@sunfishcode
Copy link
Member Author

r? @joshtriplett

@sunfishcode
Copy link
Member Author

@rustbot label -S-waiting-on-author +S-waiting-on-review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 31, 2022
@joshtriplett
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Jun 23, 2022

📌 Commit e620b42644e3404377fa0fae0afd660c6bd77999 has been approved by joshtriplett

@bors
Copy link
Collaborator

bors commented Jun 23, 2022

🌲 The tree is currently closed for pull requests below priority 1000. This pull request will be tested once the tree is reopened.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 23, 2022
@klensy
Copy link
Contributor

klensy commented Jun 23, 2022

@sunfishcode
Copy link
Member Author

I think I used the github merge-conflict UI and didn't realize that generated a merge commit. I've now fixed that.

r? @joshtriplett

@crlf0710 crlf0710 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 10, 2022
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 13, 2022
@sunfishcode
Copy link
Member Author

Ping @joshtriplett; this is already r+ above, it just needs an approval because of the merge-commit issue which is now fixed.

@jyn514
Copy link
Member

jyn514 commented Aug 23, 2022

@bors r=joshtriplett

@bors
Copy link
Collaborator

bors commented Aug 23, 2022

📌 Commit ddeb936 has been approved by joshtriplett

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 23, 2022
@bors
Copy link
Collaborator

bors commented Aug 24, 2022

⌛ Testing commit ddeb936 with merge 25ea5a3...

@bors
Copy link
Collaborator

bors commented Aug 24, 2022

☀️ Test successful - checks-actions
Approved by: joshtriplett
Pushing 25ea5a3 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 24, 2022
@bors bors merged commit 25ea5a3 into rust-lang:master Aug 24, 2022
@rustbot rustbot added this to the 1.65.0 milestone Aug 24, 2022
@sunfishcode sunfishcode deleted the main branch August 24, 2022 04:07
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (25ea5a3): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
2.3% [2.3%, 2.3%] 1
Regressions ❌
(secondary)
4.9% [2.6%, 7.2%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.8% [-3.9%, -2.0%] 4
All ❌✅ (primary) 2.3% [2.3%, 2.3%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Footnotes

  1. the arithmetic mean of the percent change

  2. number of relevant changes

/// It is possible for `bytes` to have valid UTF-8 without this being
/// set, such as when we're concatenating `&Wtf8`'s and surrogates become
/// paired, as we don't bother to rescan the entire string.
is_known_utf8: bool,
Copy link
Member

@RalfJung RalfJung Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a field here introduced a subtle bug in PathBuf: #124409.

Privacy-breaking transmutes are "fun". ;)

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 12, 2024
…ng-utf8-invariant, r=<try>

Make PathBuf less Ok with adding UTF-16 then `into_string`

Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869.

try-job: x86_64-msvc
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Jun 12, 2024
…ring-utf8-invariant, r=joboet

Make PathBuf less Ok with adding UTF-16 then `into_string`

Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869.

try-job: x86_64-msvc
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jun 12, 2024
Rollup merge of rust-lang#126305 - workingjubilee:fix-os-string-to-string-utf8-invariant, r=joboet

Make PathBuf less Ok with adding UTF-16 then `into_string`

Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869.

try-job: x86_64-msvc
fmease added a commit to fmease/rust that referenced this pull request Feb 26, 2025
…isDenton

Add UTF-8 validation fast paths in `Wtf8Buf`

This adds two more fast paths for UTF-8 validation in `Wtf8Buf`, making use of the `is_known_utf8` flag added in rust-lang#96869 (Optimize `Wtf8Buf::into_string` for the case where it contains UTF-8).

r? `@ChrisDenton`
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Feb 26, 2025
Rollup merge of rust-lang#137154 - thaliaarchi:wtf8-fast-paths, r=ChrisDenton

Add UTF-8 validation fast paths in `Wtf8Buf`

This adds two more fast paths for UTF-8 validation in `Wtf8Buf`, making use of the `is_known_utf8` flag added in rust-lang#96869 (Optimize `Wtf8Buf::into_string` for the case where it contains UTF-8).

r? `@ChrisDenton`
github-actions bot pushed a commit to tautschnig/verify-rust-std that referenced this pull request Mar 11, 2025
…isDenton

Add UTF-8 validation fast paths in `Wtf8Buf`

This adds two more fast paths for UTF-8 validation in `Wtf8Buf`, making use of the `is_known_utf8` flag added in rust-lang#96869 (Optimize `Wtf8Buf::into_string` for the case where it contains UTF-8).

r? `@ChrisDenton`
github-actions bot pushed a commit to tautschnig/verify-rust-std that referenced this pull request Mar 11, 2025
…isDenton

Add UTF-8 validation fast paths in `Wtf8Buf`

This adds two more fast paths for UTF-8 validation in `Wtf8Buf`, making use of the `is_known_utf8` flag added in rust-lang#96869 (Optimize `Wtf8Buf::into_string` for the case where it contains UTF-8).

r? `@ChrisDenton`
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.