Skip to content

Regenerate character tables for Unicode 12.1 #62641

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 2 commits into from
Jul 24, 2019

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Jul 12, 2019

No description provided.

@rust-highfive
Copy link
Contributor

r? @bluss

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 12, 2019
@cuviper
Copy link
Member Author

cuviper commented Jul 22, 2019

r? @SimonSapin

@rust-highfive rust-highfive assigned SimonSapin and unassigned bluss Jul 22, 2019
@matklad
Copy link
Member

matklad commented Jul 24, 2019

Let's just r+ this? I am not an expert in unicode, but this seems straightforward and blocks progress on #62848 :)

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Jul 24, 2019

📌 Commit de1e489 has been approved by matklad

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2019
Centril added a commit to Centril/rust that referenced this pull request Jul 24, 2019
Regenerate character tables for Unicode 12.1
Centril added a commit to Centril/rust that referenced this pull request Jul 24, 2019
Rollup of 10 pull requests

Successful merges:

 - rust-lang#62641 (Regenerate character tables for Unicode 12.1)
 - rust-lang#62716 (state also in the intro that UnsafeCell has no effect on &mut)
 - rust-lang#62738 (Remove uses of mem::uninitialized from std::sys::cloudabi)
 - rust-lang#62772 (Suggest trait bound on type parameter when it is unconstrained)
 - rust-lang#62890 (Normalize use of backticks in compiler messages for libsyntax/*)
 - rust-lang#62905 (Normalize use of backticks in compiler messages for doc)
 - rust-lang#62916 (Add test `self-in-enum-definition`)
 - rust-lang#62917 (Always emit trailing slash error)
 - rust-lang#62926 (Fix typo in mem::uninitialized doc)
 - rust-lang#62927 (use PanicMessage in MIR, kill InterpError::description)

Failed merges:

r? @ghost
bors added a commit that referenced this pull request Jul 24, 2019
Rollup of 10 pull requests

Successful merges:

 - #62641 (Regenerate character tables for Unicode 12.1)
 - #62716 (state also in the intro that UnsafeCell has no effect on &mut)
 - #62738 (Remove uses of mem::uninitialized from std::sys::cloudabi)
 - #62772 (Suggest trait bound on type parameter when it is unconstrained)
 - #62890 (Normalize use of backticks in compiler messages for libsyntax/*)
 - #62905 (Normalize use of backticks in compiler messages for doc)
 - #62916 (Add test `self-in-enum-definition`)
 - #62917 (Always emit trailing slash error)
 - #62926 (Fix typo in mem::uninitialized doc)
 - #62927 (use PanicMessage in MIR, kill InterpError::description)

Failed merges:

r? @ghost
@bors bors merged commit de1e489 into rust-lang:master Jul 24, 2019
Centril added a commit to Centril/rust that referenced this pull request Sep 5, 2019
Use unicode-xid crate instead of libcore

This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).

Reasons to do this:

* removing rustc-binary-specific stuff from libcore
* making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency)
* making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler

Reasons not to do this:

* increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway.
* xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster.

<details>

<summary>old description</summary>

Followup to rust-lang#59706

r? @eddyb

Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641.

cc unicode-rs/unicode-xid#11

</details>
Centril added a commit to Centril/rust that referenced this pull request Sep 5, 2019
Use unicode-xid crate instead of libcore

This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).

Reasons to do this:

* removing rustc-binary-specific stuff from libcore
* making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency)
* making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler

Reasons not to do this:

* increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway.
* xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster.

<details>

<summary>old description</summary>

Followup to rust-lang#59706

r? @eddyb

Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641.

cc unicode-rs/unicode-xid#11

</details>
Centril added a commit to Centril/rust that referenced this pull request Sep 5, 2019
Use unicode-xid crate instead of libcore

This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).

Reasons to do this:

* removing rustc-binary-specific stuff from libcore
* making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency)
* making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler

Reasons not to do this:

* increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway.
* xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster.

<details>

<summary>old description</summary>

Followup to rust-lang#59706

r? @eddyb

Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641.

cc unicode-rs/unicode-xid#11

</details>
@cuviper cuviper deleted the unicode-12.1 branch April 3, 2020 18:40
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants