update to unicode 12.1.0 #11

matklad · 2019-07-21T08:39:31Z

Ran python scripts/unicode.py, copied tables.rs.

Question: what is the maintenance status of this crate? I'd love to switch rustc's lexer to this crate (discussion): this'll make it easier to reuse lexer elsewhere, and will also allow us to remove corresponding unicode tables from libcore.

Manishearth · 2019-07-25T12:43:24Z

It's maintained, but not well. Happy to have help from rustc folks.

Manishearth · 2019-07-25T12:43:51Z

Make a PR for the new version?

matklad · 2019-07-25T12:59:00Z

@Manishearth done #12.

Pulled #7 and #8 there as well

@matklad

Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>

@matklad

Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>

@matklad

Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>

update to unicode 12.1.0

bfc6633

matklad mentioned this pull request Jul 21, 2019

Use unicode-xid crate instead of libcore rust-lang/rust#62848

Merged

Manishearth merged commit 31a29c1 into unicode-rs:master Jul 25, 2019

matklad deleted the new-unicode branch July 25, 2019 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update to unicode 12.1.0 #11

update to unicode 12.1.0 #11

Uh oh!

matklad commented Jul 21, 2019

Uh oh!

Manishearth commented Jul 25, 2019

Uh oh!

Manishearth commented Jul 25, 2019

Uh oh!

matklad commented Jul 25, 2019

Uh oh!

Uh oh!

update to unicode 12.1.0 #11

update to unicode 12.1.0 #11

Uh oh!

Conversation

matklad commented Jul 21, 2019

Uh oh!

Manishearth commented Jul 25, 2019

Uh oh!

Manishearth commented Jul 25, 2019

Uh oh!

matklad commented Jul 25, 2019

Uh oh!

Uh oh!