Skip to content

Missing docs on how to run unicode-table-generator #131640

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
RalfJung opened this issue Oct 13, 2024 · 14 comments · Fixed by #132499
Closed

Missing docs on how to run unicode-table-generator #131640

RalfJung opened this issue Oct 13, 2024 · 14 comments · Fixed by #132499
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Comments

@RalfJung
Copy link
Member

The file library/core/src/unicode/unicode_data.rs says

///! This file is generated by src/tools/unicode-table-generator; do not edit manually!

However, it doesn't say how to run that tool. The "obvious" ./x.py run src/tools/unicode-table-generator does not work. I didn't find anything in the dev guide either.

So I now edited the file by hand instead 🤷 but that can't be the goal of this.^^

Cc @Mark-Simulacrum

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 13, 2024
@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

maybe

cargo +nightly run src/tools/unicode-table-generator -- library/core/src/unicode/unicode_data.rs

This looks like a helper util that's not managed by bootstrap AFAICT

EDIT: no it's not that simple, this will cause src/tool/miri errors because of how the workspace works.

@jieyouxu
Copy link
Member

I got something working, I think?

PS E:\Repos\rust> ./x run src/tools/unicode-table-generator
Building bootstrap
    Finished `dev` profile [unoptimized] target(s) in 0.06s
Building stage0 tool unicode-table-generator (x86_64-pc-windows-msvc)
   Compiling unicode-table-generator v0.1.0 (E:\Repos\rust\src\tools\unicode-table-generator)
    Finished `release` profile [optimized + debuginfo] target(s) in 1.72s
Alphabetic     : 1727 bytes, 142759 codepoints in 757 ranges (65 - 205744) using skiplist
Case_Ignorable : 1053 bytes, 2749 codepoints in 452 ranges (39 - 918000) using skiplist
Cased          : 407 bytes, 4578 codepoints in 159 ranges (65 - 127370) using skiplist
Cc             : 9 bytes, 65 codepoints in 2 ranges (0 - 160) using skiplist
Grapheme_Extend: 887 bytes, 2193 codepoints in 375 ranges (768 - 918000) using skiplist
Lowercase      : 935 bytes, 2569 codepoints in 675 ranges (97 - 125252) using bitset
N              : 457 bytes, 1911 codepoints in 144 ranges (48 - 130042) using skiplist
Uppercase      : 799 bytes, 1978 codepoints in 656 ranges (65 - 127370) using bitset
there are 25 points
White_Space    : 256 bytes, 25 codepoints in 10 ranges (9 - 12289) using cascading
Total table sizes: 6530 bytes
Build completed successfully in 0:00:15
PS E:\Repos\rust>

@jieyouxu

This comment has been minimized.

@RalfJung
Copy link
Member Author

That's very strange, I just get an error:

$ ./x.py run src/tools/unicode-table-generator
Building bootstrap
    Finished `dev` profile [unoptimized] target(s) in 0.08s
ERROR: no `run` rules matched ["src/tools/unicode-table-generator"]
HELP: run `x.py run --help --verbose` to show a list of available paths
NOTE: if you are adding a new Step to bootstrap itself, make sure you register it with `describe!`
Build completed unsuccessfully in 0:00:00

@jieyouxu
Copy link
Member

Oh sorry to be clear, I said "I got something working" i.e. I had to hook it up in bootstrap 😆

@RalfJung
Copy link
Member Author

Ah :D

@jieyouxu
Copy link
Member

When I ran it after hooking it up, it seemed like it just modified the FIXME messages, lol

PS E:\Repos\rust> git diff .\library\
diff --git a/library/core/src/unicode/unicode_data.rs b/library/core/src/unicode/unicode_data.rs
index db2e3ddd754..143beb37706 100644
--- a/library/core/src/unicode/unicode_data.rs
+++ b/library/core/src/unicode/unicode_data.rs
@@ -18,14 +18,16 @@ const fn bitset_search<
     let bucket_idx = (needle / 64) as usize;
     let chunk_map_idx = bucket_idx / CHUNK_SIZE;
     let chunk_piece = bucket_idx % CHUNK_SIZE;
-    // FIXME(const-hack): Revert to `slice::get` when slice indexing becomes possible in const.
+    // FIXME: const-hack: Revert to `slice::get` after `const_slice_index`
+    // feature stabilizes.
     let chunk_idx = if chunk_map_idx < chunk_idx_map.len() {
         chunk_idx_map[chunk_map_idx]
     } else {
         return false;
     };
     let idx = bitset_chunk_idx[chunk_idx as usize][chunk_piece] as usize;
-    // FIXME(const-hack): Revert to `slice::get` when slice indexing becomes possible in const.
+    // FIXME: const-hack: Revert to `slice::get` after `const_slice_index`
+    // feature stabilizes.
     let word = if idx < bitset_canonical.len() {
         bitset_canonical[idx]
     } else {

@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

@RalfJung I hooked it up to bootstrap ./x run src/tools/unicode-table-generator in #131647 (maybe you can base your changes on top of that branch if you want to edit the tool itself to generate the unicode tables and run it via bootstrap 🤷). I can't say I have any clue about how that tool is supposed to be run, though.

@jieyouxu jieyouxu added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. labels Oct 13, 2024
@RalfJung
Copy link
Member Author

When I ran it after hooking it up, it seemed like it just modified the FIXME messages, lol

That's probably because I patched the unicode_data file to update the comments and didn't realize that this is a generated file with its sources somewhere else, and CI did not stop me. 😂

@jieyouxu
Copy link
Member

Ok cool, I synced the comments in the tool to your changes, and ./x run src/tools/unicode-table-generator now no longer produces a diff for library/core/src/unicode/unicode_data.rs.

@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 13, 2024
@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

Oh wait, but that means we'll have to sync back changes from #131641 once that's merged again. EDIT: the tool was also updated

@jieyouxu
Copy link
Member

I'll go add a triagebot message to remind that this is generated by the unicode-table-generator tool and should not be hand edited in source, but instead edit the tool itself.

@RalfJung
Copy link
Member Author

Oh wait, but that means we'll have to sync back changes from #131641 once that's merged again.

That PR changes both the tool and the generated file so it should not need any new manual sync... unless I screwed up and didn't properly do the same changes on both sides.

I'll go add a triagebot message to remind that this is generated by the unicode-table-generator tool and should not be hand edited in source, but instead edit the tool itself.

That's a good start, but ideally CI would fail if the file does not match. This could be a tidy check, maybe? Tidy already checks other, similar things.

@jieyouxu
Copy link
Member

jieyouxu commented Oct 13, 2024

That PR changes both the tool and the generated file so it should not need any new manual sync... unless I screwed up and didn't properly do the same changes on both sides.

Ah right good point.

That's a good start, but ideally CI would fail if the file does not match. This could be a tidy check, maybe? Tidy already checks other, similar things.

I mean, it can be checked by trying to run ./x run src/tools/unicode-table-generator then asserting nothing is modified. But yeah, it could be a tidy check. Probably Mark can decide on what to do here.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 20, 2024
…r=Mark-Simulacrum

Register `src/tools/unicode-table-generator` as a runnable tool

It seems like `src/tools/unicode-table-generator` is not currently managed by bootstrap. This PR wires it up with bootstrap as a runnable tool.

This tool seems to take two possible args:

1. (Mandatory) path to `library/core/src/unicode/unicode_data.rs`, and
2. (Optional) path to generate a test file.

I only passed the mandatory path to `unicode_data.rs` in bootstrap and didn't do anything about (2). I'm not sure about how this tool is supposed to be run.

`Cargo.lock` is modified because I renamed `unicode-table-generator`'s bin name to match the tool name, as bootstrap's tool running logic expects the bin name to be derived from the tool name.

I also added a triagebot message to remind to not manually edit the library source file and edit the tool then regenerate instead, but this should probably be a tidy check (if that's desirable then that can be in a follow-up PR, though may be overkill).

Helps with rust-lang#131640 but does not close it because still no docs.

r? `@Mark-Simulacrum` (since I think you authored this tool?)
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 20, 2024
Rollup merge of rust-lang#131647 - jieyouxu:unicode-table-generator, r=Mark-Simulacrum

Register `src/tools/unicode-table-generator` as a runnable tool

It seems like `src/tools/unicode-table-generator` is not currently managed by bootstrap. This PR wires it up with bootstrap as a runnable tool.

This tool seems to take two possible args:

1. (Mandatory) path to `library/core/src/unicode/unicode_data.rs`, and
2. (Optional) path to generate a test file.

I only passed the mandatory path to `unicode_data.rs` in bootstrap and didn't do anything about (2). I'm not sure about how this tool is supposed to be run.

`Cargo.lock` is modified because I renamed `unicode-table-generator`'s bin name to match the tool name, as bootstrap's tool running logic expects the bin name to be derived from the tool name.

I also added a triagebot message to remind to not manually edit the library source file and edit the tool then regenerate instead, but this should probably be a tidy check (if that's desirable then that can be in a follow-up PR, though may be overkill).

Helps with rust-lang#131640 but does not close it because still no docs.

r? `@Mark-Simulacrum` (since I think you authored this tool?)
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 2, 2024
unicode_data.rs: show command for generating file

rust-lang#131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes rust-lang#131640.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 3, 2024
unicode_data.rs: show command for generating file

rust-lang#131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes rust-lang#131640.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 3, 2024
unicode_data.rs: show command for generating file

rust-lang#131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes rust-lang#131640.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 3, 2024
unicode_data.rs: show command for generating file

rust-lang#131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes rust-lang#131640.
@bors bors closed this as completed in b438a5c Nov 3, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Nov 3, 2024
Rollup merge of rust-lang#132499 - RalfJung:unicode_data.rs, r=tgross35

unicode_data.rs: show command for generating file

rust-lang#131647 made this an easily runnable tool, now we just have to mention that in the comment. :)

Fixes rust-lang#131640.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants