Improve `print_tts` by making `space_between` smarter #117433

nnethercote · 2023-10-31T08:31:00Z

space_between currently handles a few cases that make the output nicer. It also gets some cases wrong. This PR fixes the wrong cases, and adds a bunch of extra cases, resulting in prettier output. E.g. these lines:

use smallvec :: SmallVec ;
assert! (mem :: size_of :: < T > () != 0) ;

become these lines:

use smallvec::SmallVec;
assert!(mem::size_of:: < T >() != 0);

This overlaps with #114571, but this PR has the crucial characteristic of giving the same results for all token streams, including those generated by proc macros. For that reason I think it's worth having even if/when #114571 is merged. It's also nice that this PR's improvements can be obtained by modifying only space_between.

r? @petrochenkov

petrochenkov · 2023-10-31T11:58:19Z

This PR changes punctuation jointness in many cases, something I advised against in the previous similar PR - #97340 (comment).

After #114571 lands and after Spacing::Unspecified is added for proc-macro generated non-punctuation tokens space_between heuristic will only be taken into account for Spacing::Unspecified tokens.

So, right now, I think, that would be equivalent to never using space_between if tt1 is a punctuation token.
@rustbot author

petrochenkov · 2023-10-31T12:03:45Z

So, right now, I think, that would be equivalent to never using space_between if tt1 is a punctuation token.

Update: when both tt1 and tt2 are punctuation tokens.
The pretty-printing result is a text, and it will be processed by lexer if treated as Rust code, so the jointness will be erased anyway if tt2 is not a punctuation token.

nnethercote · 2023-11-01T02:39:12Z

Ok, I have updated the code to never remove the space between adjacent punctuation tokens. The following cases are worse:

#![attr] becomes # ![attr]
use A::*; becomes use A:: * ;
< Self as T >::thing becomes < Self as T > ::thing
Struct::< u8 > becomes Struct:: < u8 >
f(x: Vec<u32>, u32) becomes f(x: Vec<u32> , u32)

but overall it's not too bad, and still a lot better than the current output (though #! [attr] becoming # ![attr] is annoying).

The existing space_between gets the "must have space between adjacent tokens" rule wrong for some cases involving ; and $ and , and !. So I have also added some more tests to stringify.rs containing just sequences of operators. The updated space_between gets all these cases right.

I ended up merging all the previous commits that change the behaviour of space_between.

petrochenkov · 2023-11-01T12:54:52Z

compiler/rustc_ast_pretty/src/pprust/state.rs

+
+/// Should two consecutive token trees be printed with a space between them?
+///
+/// NOTE: should always be false if both token trees are punctuation, so that


I don't think this function should return anything in this case, it should rather have assert!(!is_punct(tt1) || !is_punct(tt2)) at the start instead.

The decision should be made based on Spacing in that case, and we should never reach this function.

Or the Spacing-based decision can be made inside this function, then it will be if is_punct(tt1) && is_punct(tt2) { ... } at the start instead of the assert.

I would prefer to avoid making any pretty-printing decisions based on Spacing in this PR. We can leave those to #114571, which will change how space_between is called. I plan to add the Spacing::Unknown in that PR, for tokens coming from proc macros. Those will be the cases where space_between is used.

With that decided, the current position of the assertion has the advantage that it's only checked in the case where space_between returns false.

So I think this is good enough to merge, or do a crater run if you think that is necessary.

Ok.
Crater run is needed in any case.

To avoid `!matches!(...)`, which is hard to think about. Instead every case now uses direct pattern matching and returns true or false. Also add a couple of cases to the `stringify.rs` test that currently print badly.

We currently do the wrong thing on a lot of these. The next commit will fix things.

As well as nicer output, this fixes several FIXMEs in `tests/ui/macros/stringify.rs`, where the current output is sub-optimal.

petrochenkov · 2023-11-02T11:24:06Z

compiler/rustc_ast_pretty/src/pprust/state.rs

+        // NON-PUNCT + `;`: `x = 3;`, `[T; 3]`
+        // NON-PUNCT + `.`: `x.y`, `tup.0`
+        // NON-PUNCT + `:`: `'a: loop { ... }`, `x: u8`, `where T: U`,
+        //     `<Self as T>::x`, `Trait<'a>: Sized`, `X<Y<Z>>: Send`,


These examples still involve punctuation and need an update.

petrochenkov · 2023-11-02T11:38:41Z

@bors try

bors · 2023-11-02T11:39:52Z

⌛ Trying commit 9b9f8f0 with merge 794be3c...

Improve `print_tts` by making `space_between` smarter `space_between` currently handles a few cases that make the output nicer. It also gets some cases wrong. This PR fixes the wrong cases, and adds a bunch of extra cases, resulting in prettier output. E.g. these lines: ``` use smallvec :: SmallVec ; assert! (mem :: size_of :: < T > () != 0) ; ``` become these lines: ``` use smallvec::SmallVec; assert!(mem::size_of:: < T >() != 0); ``` This overlaps with rust-lang#114571, but this PR has the crucial characteristic of giving the same results for all token streams, including those generated by proc macros. For that reason I think it's worth having even if/when rust-lang#114571 is merged. It's also nice that this PR's improvements can be obtained by modifying only `space_between`. r? `@petrochenkov`

bors · 2023-11-02T13:06:14Z

☀️ Try build successful - checks-actions
Build commit: 794be3c (794be3cb2f20e1fd15af31d8a42d380fc36600ed)

petrochenkov · 2023-11-02T13:30:47Z

@craterbot check

craterbot · 2023-11-02T13:30:58Z

👌 Experiment pr-117433 created and queued.
🤖 Automatically detected try build 794be3c
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

craterbot · 2023-11-02T13:31:09Z

🚧 Experiment pr-117433 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

petrochenkov · 2023-11-04T12:06:48Z

@craterbot check p=1 crates=https://crater-reports.s3.amazonaws.com/pr-117433/retry-regressed-list.txt

craterbot · 2023-11-04T12:06:56Z

👌 Experiment pr-117433-1 created and queued.
🤖 Automatically detected try build 794be3c
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

craterbot · 2023-11-04T12:08:03Z

🚧 Experiment pr-117433-1 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

craterbot · 2023-11-04T12:21:25Z

🎉 Experiment pr-117433-1 is completed!
📊 36 regressed and 0 fixed (53 total)
📰 Open the full report.

⚠️ If you notice any spurious failure please add them to the blacklist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

petrochenkov · 2023-11-05T12:27:29Z

There are some legitimate regressions here, waiting on author to triage.
@rustbot author

nnethercote · 2023-11-05T23:30:55Z

🎉 Experiment pr-117433-1 is completed! 📊 36 regressed and 0 fixed (53 total) 📰 Open the full report.

I still don't understand how to read crater reports properly. But looking at the "regressed: dependencies" section, there are five problems, all involving conversions of a token stream to a string and then "parsing" the resulting string. Annoying stuff, and I'm not sure how to proceed.

atspi-proxies-0.1.0

It uses atspi-macros-0.4.0, which has this code in src/proxy.rs:

// TODO: this is sketchy as all hell 
// it replaces all mentions of zbus::Result with the Generic std::result::Result, then, adds the Self::Error error type to the second part of the generic
// finally, it replaces all mentions of (String, zbus :: zvairnat :: OwnedObjectPath) with &Self.
// this menas that implementors will need to return a borrowed value of the same type to comply with the type system.
// unsure if this will hold up over time.
fn genericize_method_return_type(rt: &ReturnType) -> TokenStream {
        let original = format!("{}", rt.to_token_stream());
        let mut generic_result = original.replace("zbus :: Result", "std :: result :: Result");
        let end_of_str = generic_result.len();
        generic_result.insert_str(end_of_str - 2, ", Self :: Error");
        let mut generic_impl = generic_result.replace(OBJECT_PAIR_NAME, "Self");
        generic_impl.push_str(" where Self: Sized");
        TokenStream::from_str(&generic_impl).expect("Could not genericize zbus method/property/signal. Attempted to turn \"{generic_result}\" into a TokenStream.")
}

The spacing of zbus :: Result has changed to zbus::Result, which breaks this code.

awto-0.1.2

It uses awto-macros-0.1.2, which has this code in src/proc_macros/schema/database_table.rs:

                let db_type_is_text = ty.to_string().ends_with(":: Text");
                if let Some(max_len) = &field.attrs.max_len {
                    if !db_type_is_text {
                        return Err(syn::Error::new(
                            max_len.span(),
                            "max_len can only be used on varchar & char types",
                        ));
                    }
                    ty = quote!(#ty(Some(#max_len)));
                } else if db_type_is_text {
                    ty = quote!(#ty(None));
                }

The change in spacing of :: Text to ::Text breaks this.

ink-analyzer-ir-0.7.0

crates/macro/src/from_ast.rs has this:

pub fn impl_from_ast(ast: &syn::DeriveInput) -> syn::Result<TokenStream> {                                                                            
    let name = &ast.ident;                                                                                                                            
                                                                                                                                                      
    if let Some(fields) = utils::parse_struct_fields(ast) {                                                                                           
        if let Some(ast_field) = utils::find_field(fields, "ast") {                                                                                   
            let ir_crate_path = utils::get_normalized_ir_crate_path();                                                                                
            let ast_field_type = &ast_field.ty;                                                                                                       
            let ast_type = if ast_field_type                                                                                                          
                .to_token_stream()                                                                                                                    
                .to_string()                                                                                                                          
                .starts_with("ast ::")
            {                                                                                                                                         
                quote! { #ast_field_type }                                                                                                            
            } else {                                                                                                                                  
                quote! { ast::#ast_field_type }                                                                                                       
            };

The change of spacing from ast :: to ast:: breaks this.

kcl-lib-0.1.35

It uses derive-docs-0.4.0 which has this code in src/lib.rs:

    let ret_ty = ast.sig.output.clone();
    let ret_ty_string = ret_ty
        .into_token_stream()
        .to_string()
        .replace("-> ", "")
        .replace("Result < ", "")
        .replace(", KclError >", "");
    let return_type = if !ret_ty_string.is_empty() {
        let ret_ty_string = if ret_ty_string.starts_with("Box <") {
            ret_ty_string
                .trim_start_matches("Box <")
                .trim_end_matches('>')
                .trim()
                .to_string()
        } else {
            ret_ty_string.trim().to_string()
        };

An example return type changes from this:

-> Result < Box < ExtrudeGroup >, KclError > {}

to this:

-> Result < Box < ExtrudeGroup > , KclError > {}

And the space inserted between the > and , breaks the parsing, because trim_end_matches('>') fails due to the trailing space.

pagetop-0.0.46

This uses pagetop-macros-0.0.5, which has this code in in src/lib.rs:

    let args: Vec<String> = fn_item                                                                                                  
        .sig                                                                                                                         
        .inputs                                                                                                                      
        .iter()                                                                                                                      
        .skip(1)                                                                                                                     
        .map(|arg| arg.to_token_stream().to_string())                                                                                
        .collect();                                                                                                                  
                                                                                                                                     
    let param: Vec<String> = args                                                                                                    
        .iter()                                                                                                                      
        .map(|arg| arg.split_whitespace().next().unwrap().to_string())                                                               
        .collect();                                                                                                                  
                                                                                                                                     
    #[rustfmt::skip]                                                                                                                 
    let fn_with = parse_str::<ItemFn>(concat_string!("                                                                               
        pub fn ", fn_name.replace("alter_", "with_"), "(mut self, ", args.join(", "), ") -> Self {                                   
            self.", fn_name, "(", param.join(", "), ");                                                                              
            self                                                                                                                     
        }                                                                                                                            
    ").as_str()).unwrap();

On a signature like this:

    pub fn alter_value(&mut self, value: impl Into<String>) -> &mut Self {

the old pretty printer printed value :, the new one does value:, so the arg.split_whitespace().next() returns value: instead of value, which messes up the subsequent code.

petrochenkov · 2023-11-07T14:30:38Z

I think we can follow the usual breaking change procedure here.

If the regressed crate is alive - send a fix.
Stuff like contains("foo :: bar") can be trivially replaced with contains("foo :: bar") || contains("foo::bar").

If the regressed crate is also popular (e.g. responsible for the largest number of regressions in the report), then special case it in the compiler.
E.g. do not remove spaces around :: if the neighboring tokens are foo and bar.

@rustbot author

nnethercote · 2023-11-08T04:20:12Z

Interestingly, the joint-based pretty printer in #114571 doesn't break the five examples above. Presumably because the affected token streams come from proc macros, and #114571 doesn't make changes to how those are printed.

Now that I better understand the effects of these kinds of changes on real world code, I have the following plan:

Merge the first two commits from this PR in Clarify space_between #117698, because they're useful and make no functional changes.
Give up on improving space_between, or scale it back to just the changes that don't cause errors.
Return to the joint-based approach in Improve print_tts #114571.

…ments See rust-lang/rust#117433 (comment)

nnethercote · 2023-12-20T03:17:27Z

Now that I better understand the effects of these kinds of changes on real world code, I have the following plan:

Merge the first two commits from this PR in Clarify space_between #117698, because they're useful and make no functional changes.

Give up on improving space_between, or scale it back to just the changes that don't cause errors.

Return to the joint-based approach in Improve print_tts #114571.

These have all been done.

rustbot assigned petrochenkov Oct 31, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 31, 2023

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 31, 2023

nnethercote force-pushed the space_between branch from bae356b to 6e849e8 Compare November 1, 2023 02:39

nnethercote added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 1, 2023

petrochenkov reviewed Nov 1, 2023

View reviewed changes

petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 1, 2023

nnethercote added 3 commits November 2, 2023 08:15

Clarify space_between.

c3f939d

To avoid `!matches!(...)`, which is hard to think about. Instead every case now uses direct pattern matching and returns true or false. Also add a couple of cases to the `stringify.rs` test that currently print badly.

More tests for token stream pretty-printing with adjacent punctuation.

f54eb04

We currently do the wrong thing on a lot of these. The next commit will fix things.

Improve token stream pretty-printing.

9b9f8f0

As well as nicer output, this fixes several FIXMEs in `tests/ui/macros/stringify.rs`, where the current output is sub-optimal.

nnethercote force-pushed the space_between branch from 6e849e8 to 9b9f8f0 Compare November 2, 2023 04:59

petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 2, 2023

petrochenkov reviewed Nov 2, 2023

View reviewed changes

petrochenkov added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 2, 2023

craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Nov 4, 2023

craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 4, 2023

craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Nov 4, 2023

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 5, 2023

petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 6, 2023

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 7, 2023

nnethercote mentioned this pull request Nov 8, 2023

Clarify space_between #117698

Merged

tjallingt added a commit to tjallingt/awto that referenced this pull request Nov 8, 2023

chore: ensure macro is prepared for potential pretty printing improve…

44a6db5

…ments See rust-lang/rust#117433 (comment)

tjallingt mentioned this pull request Nov 8, 2023

chore: ensure macro is prepared for potential pretty printing improvements tqwewe/awto#3

Merged

tjallingt added a commit to tjallingt/ink-analyzer that referenced this pull request Nov 8, 2023

chore: ensure macro is prepared for potential pretty printing improve…

42d7abc

…ments See rust-lang/rust#117433 (comment)

tjallingt mentioned this pull request Nov 8, 2023

chore: ensure macro is prepared for potential pretty printing improvements ink-analyzer/ink-analyzer#3

Merged

tjallingt added a commit to tjallingt/pagetop that referenced this pull request Nov 8, 2023

chore: ensure macro is prepared for potential pretty printing improve…

91d6b12

…ments See rust-lang/rust#117433 (comment)

tjallingt mentioned this pull request Nov 8, 2023

chore: ensure macro is prepared for potential pretty printing improvements manuelcillero/pagetop#1

Merged

nnethercote closed this Dec 20, 2023

nnethercote mentioned this pull request Jan 22, 2024

Further improve space_between #120227

Merged

Improve print_tts by making space_between smarter #117433

Improve print_tts by making space_between smarter #117433

Uh oh!

Conversation

nnethercote commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petrochenkov commented Oct 31, 2023

Uh oh!

petrochenkov commented Oct 31, 2023

Uh oh!

nnethercote commented Nov 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petrochenkov Nov 1, 2023

Choose a reason for hiding this comment

Uh oh!

petrochenkov Nov 1, 2023

Choose a reason for hiding this comment

Uh oh!

nnethercote Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

petrochenkov Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

petrochenkov Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

petrochenkov commented Nov 2, 2023

Uh oh!

bors commented Nov 2, 2023

Uh oh!

bors commented Nov 2, 2023

Uh oh!

petrochenkov commented Nov 2, 2023

Uh oh!

craterbot commented Nov 2, 2023

Uh oh!

craterbot commented Nov 2, 2023

Uh oh!

petrochenkov commented Nov 4, 2023

Uh oh!

craterbot commented Nov 4, 2023

Uh oh!

craterbot commented Nov 4, 2023

Uh oh!

craterbot commented Nov 4, 2023

Uh oh!

petrochenkov commented Nov 5, 2023

Uh oh!

nnethercote commented Nov 5, 2023

atspi-proxies-0.1.0

awto-0.1.2

ink-analyzer-ir-0.7.0

kcl-lib-0.1.35

pagetop-0.0.46

Uh oh!

petrochenkov commented Nov 7, 2023

Uh oh!

nnethercote commented Nov 8, 2023

Uh oh!

nnethercote commented Dec 20, 2023

Uh oh!

Uh oh!

Improve `print_tts` by making `space_between` smarter #117433

Improve `print_tts` by making `space_between` smarter #117433

nnethercote commented Oct 31, 2023 •

edited

Loading

nnethercote commented Nov 1, 2023 •

edited

Loading