-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Refactor core::char::EscapeDefault and co. structures #105076
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
r? @scottmcm (rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
☔ The latest upstream changes (presumably #105671) made this pull request unmergeable. Please resolve the merge conflicts. |
Hello @mina86! I just want to ping you as part of the triage procedure as this PR has merge conflicts :) |
Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug} structures from using a state machine to computing escaped sequence upfront and during iteration just going through the characters. This is arguably simpler since it’s easier to think about having a buffer and start..end range to iterate over rather than thinking about a state machine. This also harmonises implementation of aforementioned iterators and core::ascii::EscapeDefault struct. This is done by introducing a new helper EscapeIterInner struct which holds the buffer and offers simple methods for iterating over range. As a side effect, this probably optimises Display implementation for those types since rather than calling write_char repeatedly, write_str is invoked once. On 64-bit platforms, it also reduces size of some of the structs: | Struct | Before | After | |----------------------------+--------+-------+ | core::char::EscapeUnicode | 16 | 12 | | core::char::EscapeDefault | 16 | 12 | | core::char::EscapeDebug | 16 | 16 | My ulterior motive and reason why I started looking into this is addition of as_str method to the iterators. With this change this will became trivial. It’s also going to be trivial to implement DoubleEndedIterator if that’s ever desired.
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking a bazillion years to review this. I like the approach of not bothering with an inline state machine for these -- it seems unlikely that people ever want just the first couple bytes without the rest, and just computing them straight-line upfront ought to be net much cheaper than when it's spread over multiple steps.
I've left a bunch of thoughts as I went through, but nothing drastic. Please go through and address them -- either with code changes or by replying to them with why you think the existing is better -- then we can get it landed!
@rustbot author
use crate::num::NonZeroUsize; | ||
use crate::ops::Range; | ||
|
||
const HEX_DIGITS: [u8; 16] = *b"0123456789abcdef"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curiosity: I see that the old escape_ascii
had hex_digits: &[u8; 16]
. Any idea if using it by value here (instead of the reference) makes any difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It most likely doesn’t matter here. I use array out of habit but I’m not even sure it’s relevant for Rust where everything is compiled statically and LTO is common.
@@ -0,0 +1,97 @@ | |||
//! Helper code for character escaping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pondering: it's not obvious to me that a new top-level module is the right place for this.
Maybe have it be ascii/escape.rs
instead, since it's always escaping things to ascii?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn’t want to put it inside ascii
is because of slight differences in escaping. '\0'.escape_default()
produces \u{0}
while 0u8.escape_default()
produces \x00
(also '\0'.escape_debug()
produces \0
). The way I was thinking about it is that std::ascii::escape_default
and char::escape_default
both use EscapeIterInner
so they are at the same ‘hierarchy’ meaning that EscapeIterInner
shouldn’t be inside std::ascii
.
But yes, I understand your concerns and I’m not completely convinced either. Maybe I’m just overthinking this?
Strangely if I try to update the branch I’m getting: $ ./x.py test library/core
...
error: two packages named `la-arena` in this workspace:
- /srv/mpn/rust/src/tools/rust-analyzer/lib/arena/Cargo.toml
- /srv/mpn/rust/src/tools/rust-analyzer/lib/la-arena/Cargo.toml @rustbot ready |
Thanks! @bors r+ |
Rollup of 7 pull requests Successful merges: - rust-lang#105076 (Refactor core::char::EscapeDefault and co. structures) - rust-lang#108161 (Add `ConstParamTy` trait) - rust-lang#108668 (Stabilize debugger_visualizer) - rust-lang#110512 (Fix elaboration with associated type bounds) - rust-lang#110895 (Remove `all` in target_thread_local cfg) - rust-lang#110955 (uplift `clippy::clone_double_ref` as `suspicious_double_ref_op`) - rust-lang#111048 (Mark`feature(return_position_impl_trait_in_trait)` and`feature(async_fn_in_trait)` as not incomplete) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.
This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.
This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct. This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.
As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once. On 64-bit platforms, it also reduces size of some of
the structs:
My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators. With this change this
will became trivial. It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.