Refactor core::char::EscapeDefault and co. structures #105076

mina86 · 2022-11-29T23:35:56Z

Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.

This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.

This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct. This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.

As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once. On 64-bit platforms, it also reduces size of some of
the structs:

| Struct                     | Before | After |
|----------------------------+--------+-------+
| core::char::EscapeUnicode  |     16 |    12 |
| core::char::EscapeDefault  |     16 |    12 |
| core::char::EscapeDebug    |     16 |    16 |

My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators. With this change this
will became trivial. It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.

rustbot · 2022-11-29T23:36:03Z

r? @scottmcm

(rustbot has picked a reviewer for you, use r? to override)

rustbot · 2022-11-29T23:36:05Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

bors · 2023-02-12T14:50:52Z

☔ The latest upstream changes (presumably #105671) made this pull request unmergeable. Please resolve the merge conflicts.

anden3 · 2023-04-05T16:04:58Z

Hello @mina86! I just want to ping you as part of the triage procedure as this PR has merge conflicts :)

Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug} structures from using a state machine to computing escaped sequence upfront and during iteration just going through the characters. This is arguably simpler since it’s easier to think about having a buffer and start..end range to iterate over rather than thinking about a state machine. This also harmonises implementation of aforementioned iterators and core::ascii::EscapeDefault struct. This is done by introducing a new helper EscapeIterInner struct which holds the buffer and offers simple methods for iterating over range. As a side effect, this probably optimises Display implementation for those types since rather than calling write_char repeatedly, write_str is invoked once. On 64-bit platforms, it also reduces size of some of the structs: | Struct | Before | After | |----------------------------+--------+-------+ | core::char::EscapeUnicode | 16 | 12 | | core::char::EscapeDefault | 16 | 12 | | core::char::EscapeDebug | 16 | 16 | My ulterior motive and reason why I started looking into this is addition of as_str method to the iterators. With this change this will became trivial. It’s also going to be trivial to implement DoubleEndedIterator if that’s ever desired.

mina86 · 2023-04-05T17:14:34Z

Hello @mina86! I just want to ping you as part of the triage procedure as this PR has merge conflicts :)

Done.

scottmcm

Sorry for taking a bazillion years to review this. I like the approach of not bothering with an inline state machine for these -- it seems unlikely that people ever want just the first couple bytes without the rest, and just computing them straight-line upfront ought to be net much cheaper than when it's spread over multiple steps.

I've left a bunch of thoughts as I went through, but nothing drastic. Please go through and address them -- either with code changes or by replying to them with why you think the existing is better -- then we can get it landed!

@rustbot author

scottmcm · 2023-04-29T22:18:10Z

library/core/src/escape.rs

+use crate::num::NonZeroUsize;
+use crate::ops::Range;
+
+const HEX_DIGITS: [u8; 16] = *b"0123456789abcdef";


curiosity: I see that the old escape_ascii had hex_digits: &[u8; 16]. Any idea if using it by value here (instead of the reference) makes any difference?

It most likely doesn’t matter here. I use array out of habit but I’m not even sure it’s relevant for Rust where everything is compiled statically and LTO is common.

scottmcm · 2023-04-29T22:21:27Z

library/core/src/escape.rs

@@ -0,0 +1,97 @@
+//! Helper code for character escaping.


pondering: it's not obvious to me that a new top-level module is the right place for this.

Maybe have it be ascii/escape.rs instead, since it's always escaping things to ascii?

The reason I didn’t want to put it inside ascii is because of slight differences in escaping. '\0'.escape_default() produces \u{0} while 0u8.escape_default() produces \x00 (also '\0'.escape_debug() produces \0). The way I was thinking about it is that std::ascii::escape_default and char::escape_default both use EscapeIterInner so they are at the same ‘hierarchy’ meaning that EscapeIterInner shouldn’t be inside std::ascii.

But yes, I understand your concerns and I’m not completely convinced either. Maybe I’m just overthinking this?

library/core/src/escape.rs

library/core/src/char/methods.rs

library/core/src/escape.rs

mina86 · 2023-04-30T13:42:47Z

Strangely if I try to update the branch I’m getting:

$ ./x.py test library/core
...
error: two packages named `la-arena` in this workspace:
- /srv/mpn/rust/src/tools/rust-analyzer/lib/arena/Cargo.toml
- /srv/mpn/rust/src/tools/rust-analyzer/lib/la-arena/Cargo.toml

@rustbot ready

scottmcm · 2023-04-30T18:36:14Z

Thanks!

@bors r+

bors · 2023-04-30T18:36:16Z

📌 Commit 76c9947 has been approved by scottmcm

It is now in the queue for this repository.

Rollup of 7 pull requests Successful merges: - rust-lang#105076 (Refactor core::char::EscapeDefault and co. structures) - rust-lang#108161 (Add `ConstParamTy` trait) - rust-lang#108668 (Stabilize debugger_visualizer) - rust-lang#110512 (Fix elaboration with associated type bounds) - rust-lang#110895 (Remove `all` in target_thread_local cfg) - rust-lang#110955 (uplift `clippy::clone_double_ref` as `suspicious_double_ref_op`) - rust-lang#111048 (Mark`feature(return_position_impl_trait_in_trait)` and`feature(async_fn_in_trait)` as not incomplete) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup

rustbot assigned scottmcm Nov 29, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 29, 2022

scottmcm mentioned this pull request Feb 12, 2023

Add an "ascii character" type to reduce unsafe needs in conversions rust-lang/libs-team#179

Closed

anden3 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 5, 2023

mina86 force-pushed the a branch from f77a652 to 4510439 Compare April 5, 2023 17:13

anden3 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 5, 2023

scottmcm requested changes Apr 29, 2023

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 29, 2023

mina86 added 2 commits April 30, 2023 03:59

review

4d0f7e2

a bit more usize::from

76c9947

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 30, 2023

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 30, 2023

scottmcm approved these changes Apr 30, 2023

View reviewed changes

Dylan-DPC mentioned this pull request May 2, 2023

Rollup of 7 pull requests #111089

Merged

bors merged commit f916c44 into rust-lang:master May 2, 2023

rustbot added this to the 1.71.0 milestone May 2, 2023

scottmcm mentioned this pull request May 4, 2023

Tracking Issue for ascii::Char (ACP 179) #110998

Open

7 tasks

mina86 deleted the a branch January 27, 2024 06:45

Refactor core::char::EscapeDefault and co. structures #105076

Refactor core::char::EscapeDefault and co. structures #105076

Uh oh!

Conversation

mina86 commented Nov 29, 2022

Uh oh!

rustbot commented Nov 29, 2022

Uh oh!

rustbot commented Nov 29, 2022

Uh oh!

bors commented Feb 12, 2023

Uh oh!

anden3 commented Apr 5, 2023

Uh oh!

mina86 commented Apr 5, 2023

Uh oh!

scottmcm left a comment

Choose a reason for hiding this comment

Uh oh!

scottmcm Apr 29, 2023

Choose a reason for hiding this comment

Uh oh!

mina86 Apr 30, 2023

Choose a reason for hiding this comment

Uh oh!

scottmcm Apr 29, 2023

Choose a reason for hiding this comment

Uh oh!

mina86 Apr 30, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mina86 commented Apr 30, 2023

Uh oh!

scottmcm commented Apr 30, 2023

Uh oh!

bors commented Apr 30, 2023

Uh oh!

Uh oh!