-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Use unicode-xid crate instead of libcore #62848
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
/// be an alphabetic character followed by any number of alphanumeric | ||
/// characters. | ||
/// Parses a word starting at the current position. A word is the same as | ||
/// Rust identifier, except that it can't start with `_` character. | ||
fn word(&mut self) -> &'a str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we (accidentally?) don't gate on non-ascii idents here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c5673f4b78f88393d9ba4f46f4838d38
Well, one rebase seems better than a separate PR (especially for code that can be just re-generated). |
Makes sense, added a commit that regenerates table using existing unicode version. However, removing |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
cc #62869 |
☔ The latest upstream changes (presumably #62902) made this pull request unmergeable. Please resolve the merge conflicts. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☔ The latest upstream changes (presumably #62935) made this pull request unmergeable. Please resolve the merge conflicts. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
e0e26cd
to
57f11b8
Compare
This is now blocked on unicode-rs/unicode-xid#11. cc @Manishearth :-) |
@rustbot modify labels to +T-compiler |
@rfcbot fcp merge |
☔ The latest upstream changes (presumably #62990) made this pull request unmergeable. Please resolve the merge conflicts. |
@rfcbot ask @rust-lang/compiler Should we move to using unicode-xid? |
On the call site, `rustc_lexer::is_whitespace` reads much better than `character_properties::is_whitespace`.
@bors r+ |
📌 Commit 206fe8e has been approved by |
⌛ Testing commit 206fe8e with merge b4d47710a7240f0050536d56c19d246a5ae30d5b... |
⌛ Testing commit 206fe8e with merge 1a2a386dae552c1e1ef02586fd24fe7d15c39c61... |
@bors retry prioritizing r0llup since queue is really long. |
⌛ Testing commit 206fe8e with merge 722de75f4e4fd9f8c7844664289725117ea40d1b... |
Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>
@bors retry rolled up. |
⌛ Testing commit 206fe8e with merge aa4e70a7a1f891b5f40f78cf83f936a659ab0572... |
Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>
@bors retry rolled up. |
Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>
Rollup of 11 pull requests Successful merges: - #62848 (Use unicode-xid crate instead of libcore) - #63774 (Fix `window.hashchange is not a function`) - #63930 (Account for doc comments coming from proc macros without spans) - #64003 (place: Passing `align` = `layout.align.abi`, when also passing `layout`) - #64030 (Fix unlock ordering in SGX synchronization primitives) - #64041 (use TokenStream rather than &[TokenTree] for built-in macros) - #64051 (Add x86_64-linux-kernel target) - #64063 (Fix const_err with `-(-0.0)`) - #64083 (Point at appropriate arm on type error on if/else/match with one non-! arm) - #64100 (Fix const eval bug breaking run-pass tests in Miri) - #64157 (Opaque type locations in error message for clarity.) Failed merges: r? @ghost
This PR proposes to remove
char::is_xid_start
andchar::is_xid_continue
functions fromlibcore
and useunicode_xid
crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).Reasons to do this:
unicode-xid
has almost 10 million downs, as aproc_macro2
dependency)rustc_lexer
crate with rust-analyzer: no need to#[cfg]
if we are building as a part of the compilerReasons not to do this:
./unicode.py
after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, someone needs to maintain it anyway.old description
Followup to #59706
r? @eddyb
Note that this doesn't actually remove tables from libcore, to avoid conflict with #62641.
cc unicode-rs/unicode-xid#11