Skip to content

Improve print_tts by making space_between smarter #117433

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 3 commits into from

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Oct 31, 2023

space_between currently handles a few cases that make the output nicer. It also gets some cases wrong. This PR fixes the wrong cases, and adds a bunch of extra cases, resulting in prettier output. E.g. these lines:

use smallvec :: SmallVec ;
assert! (mem :: size_of :: < T > () != 0) ;

become these lines:

use smallvec::SmallVec;
assert!(mem::size_of:: < T >() != 0);

This overlaps with #114571, but this PR has the crucial characteristic of giving the same results for all token streams, including those generated by proc macros. For that reason I think it's worth having even if/when #114571 is merged. It's also nice that this PR's improvements can be obtained by modifying only space_between.

r? @petrochenkov

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 31, 2023
@petrochenkov
Copy link
Contributor

This PR changes punctuation jointness in many cases, something I advised against in the previous similar PR - #97340 (comment).

After #114571 lands and after Spacing::Unspecified is added for proc-macro generated non-punctuation tokens space_between heuristic will only be taken into account for Spacing::Unspecified tokens.

So, right now, I think, that would be equivalent to never using space_between if tt1 is a punctuation token.
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 31, 2023
@petrochenkov
Copy link
Contributor

So, right now, I think, that would be equivalent to never using space_between if tt1 is a punctuation token.

Update: when both tt1 and tt2 are punctuation tokens.
The pretty-printing result is a text, and it will be processed by lexer if treated as Rust code, so the jointness will be erased anyway if tt2 is not a punctuation token.

@nnethercote
Copy link
Contributor Author

nnethercote commented Nov 1, 2023

Ok, I have updated the code to never remove the space between adjacent punctuation tokens. The following cases are worse:

  • #![attr] becomes # ![attr]
  • use A::*; becomes use A:: * ;
  • < Self as T >::thing becomes < Self as T > ::thing
  • Struct::< u8 > becomes Struct:: < u8 >
  • f(x: Vec<u32>, u32) becomes f(x: Vec<u32> , u32)

but overall it's not too bad, and still a lot better than the current output (though #! [attr] becoming # ![attr] is annoying).

The existing space_between gets the "must have space between adjacent tokens" rule wrong for some cases involving ; and $ and , and !. So I have also added some more tests to stringify.rs containing just sequences of operators. The updated space_between gets all these cases right.

I ended up merging all the previous commits that change the behaviour of space_between.

@nnethercote nnethercote added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 1, 2023

/// Should two consecutive token trees be printed with a space between them?
///
/// NOTE: should always be false if both token trees are punctuation, so that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this function should return anything in this case, it should rather have assert!(!is_punct(tt1) || !is_punct(tt2)) at the start instead.

The decision should be made based on Spacing in that case, and we should never reach this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the Spacing-based decision can be made inside this function, then it will be if is_punct(tt1) && is_punct(tt2) { ... } at the start instead of the assert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to avoid making any pretty-printing decisions based on Spacing in this PR. We can leave those to #114571, which will change how space_between is called. I plan to add the Spacing::Unknown in that PR, for tokens coming from proc macros. Those will be the cases where space_between is used.

With that decided, the current position of the assertion has the advantage that it's only checked in the case where space_between returns false.

So I think this is good enough to merge, or do a crater run if you think that is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.
Crater run is needed in any case.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 1, 2023
To avoid `!matches!(...)`, which is hard to think about. Instead every
case now uses direct pattern matching and returns true or false.

Also add a couple of cases to the `stringify.rs` test that currently
print badly.
We currently do the wrong thing on a lot of these. The next commit will
fix things.
As well as nicer output, this fixes several FIXMEs in
`tests/ui/macros/stringify.rs`, where the current output is sub-optimal.
@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 2, 2023
// NON-PUNCT + `;`: `x = 3;`, `[T; 3]`
// NON-PUNCT + `.`: `x.y`, `tup.0`
// NON-PUNCT + `:`: `'a: loop { ... }`, `x: u8`, `where T: U`,
// `<Self as T>::x`, `Trait<'a>: Sized`, `X<Y<Z>>: Send`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples still involve punctuation and need an update.

@petrochenkov
Copy link
Contributor

@bors try

@petrochenkov petrochenkov added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 2, 2023
@bors
Copy link
Collaborator

bors commented Nov 2, 2023

⌛ Trying commit 9b9f8f0 with merge 794be3c...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 2, 2023
Improve `print_tts` by making `space_between` smarter

`space_between` currently handles a few cases that make the output nicer. It also gets some cases wrong. This PR fixes the wrong cases, and adds a bunch of extra cases, resulting in prettier output. E.g. these lines:
```
use smallvec :: SmallVec ;
assert! (mem :: size_of :: < T > () != 0) ;
```
become these lines:
```
use smallvec::SmallVec;
assert!(mem::size_of:: < T >() != 0);
```
This overlaps with rust-lang#114571, but this PR has the crucial characteristic of giving the same results for all token streams, including those generated by proc macros. For that reason I think it's worth having even if/when rust-lang#114571 is merged. It's also nice that this PR's improvements can be obtained by modifying only `space_between`.

r? `@petrochenkov`
@bors
Copy link
Collaborator

bors commented Nov 2, 2023

☀️ Try build successful - checks-actions
Build commit: 794be3c (794be3cb2f20e1fd15af31d8a42d380fc36600ed)

@petrochenkov
Copy link
Contributor

@craterbot check

@craterbot
Copy link
Collaborator

👌 Experiment pr-117433 created and queued.
🤖 Automatically detected try build 794be3c
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot
Copy link
Collaborator

🚧 Experiment pr-117433 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Nov 4, 2023
@petrochenkov
Copy link
Contributor

@craterbot
Copy link
Collaborator

👌 Experiment pr-117433-1 created and queued.
🤖 Automatically detected try build 794be3c
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 4, 2023
@craterbot
Copy link
Collaborator

🚧 Experiment pr-117433-1 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot
Copy link
Collaborator

🎉 Experiment pr-117433-1 is completed!
📊 36 regressed and 0 fixed (53 total)
📰 Open the full report.

⚠️ If you notice any spurious failure please add them to the blacklist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Nov 4, 2023
@petrochenkov
Copy link
Contributor

There are some legitimate regressions here, waiting on author to triage.
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 5, 2023
@nnethercote
Copy link
Contributor Author

🎉 Experiment pr-117433-1 is completed! 📊 36 regressed and 0 fixed (53 total) 📰 Open the full report.

I still don't understand how to read crater reports properly. But looking at the "regressed: dependencies" section, there are five problems, all involving conversions of a token stream to a string and then "parsing" the resulting string. Annoying stuff, and I'm not sure how to proceed.

atspi-proxies-0.1.0

It uses atspi-macros-0.4.0, which has this code in src/proxy.rs:

// TODO: this is sketchy as all hell 
// it replaces all mentions of zbus::Result with the Generic std::result::Result, then, adds the Self::Error error type to the second part of the generic
// finally, it replaces all mentions of (String, zbus :: zvairnat :: OwnedObjectPath) with &Self.
// this menas that implementors will need to return a borrowed value of the same type to comply with the type system.
// unsure if this will hold up over time.
fn genericize_method_return_type(rt: &ReturnType) -> TokenStream {
        let original = format!("{}", rt.to_token_stream());
        let mut generic_result = original.replace("zbus :: Result", "std :: result :: Result");
        let end_of_str = generic_result.len();
        generic_result.insert_str(end_of_str - 2, ", Self :: Error");
        let mut generic_impl = generic_result.replace(OBJECT_PAIR_NAME, "Self");
        generic_impl.push_str(" where Self: Sized");
        TokenStream::from_str(&generic_impl).expect("Could not genericize zbus method/property/signal. Attempted to turn \"{generic_result}\" into a TokenStream.")
}

The spacing of zbus :: Result has changed to zbus::Result, which breaks this code.

awto-0.1.2

It uses awto-macros-0.1.2, which has this code in src/proc_macros/schema/database_table.rs:

                let db_type_is_text = ty.to_string().ends_with(":: Text");
                if let Some(max_len) = &field.attrs.max_len {
                    if !db_type_is_text {
                        return Err(syn::Error::new(
                            max_len.span(),
                            "max_len can only be used on varchar & char types",
                        ));
                    }
                    ty = quote!(#ty(Some(#max_len)));
                } else if db_type_is_text {
                    ty = quote!(#ty(None));
                }

The change in spacing of :: Text to ::Text breaks this.

ink-analyzer-ir-0.7.0

crates/macro/src/from_ast.rs has this:

pub fn impl_from_ast(ast: &syn::DeriveInput) -> syn::Result<TokenStream> {                                                                            
    let name = &ast.ident;                                                                                                                            
                                                                                                                                                      
    if let Some(fields) = utils::parse_struct_fields(ast) {                                                                                           
        if let Some(ast_field) = utils::find_field(fields, "ast") {                                                                                   
            let ir_crate_path = utils::get_normalized_ir_crate_path();                                                                                
            let ast_field_type = &ast_field.ty;                                                                                                       
            let ast_type = if ast_field_type                                                                                                          
                .to_token_stream()                                                                                                                    
                .to_string()                                                                                                                          
                .starts_with("ast ::")
            {                                                                                                                                         
                quote! { #ast_field_type }                                                                                                            
            } else {                                                                                                                                  
                quote! { ast::#ast_field_type }                                                                                                       
            };  

The change of spacing from ast :: to ast:: breaks this.

kcl-lib-0.1.35

It uses derive-docs-0.4.0 which has this code in src/lib.rs:

    let ret_ty = ast.sig.output.clone();
    let ret_ty_string = ret_ty
        .into_token_stream()
        .to_string()
        .replace("-> ", "")
        .replace("Result < ", "")
        .replace(", KclError >", "");
    let return_type = if !ret_ty_string.is_empty() {
        let ret_ty_string = if ret_ty_string.starts_with("Box <") {
            ret_ty_string
                .trim_start_matches("Box <")
                .trim_end_matches('>')
                .trim()
                .to_string()
        } else {
            ret_ty_string.trim().to_string()
        };

An example return type changes from this:

-> Result < Box < ExtrudeGroup >, KclError > {}

to this:

-> Result < Box < ExtrudeGroup > , KclError > {}

And the space inserted between the > and , breaks the parsing, because trim_end_matches('>') fails due to the trailing space.

pagetop-0.0.46

This uses pagetop-macros-0.0.5, which has this code in in src/lib.rs:

    let args: Vec<String> = fn_item                                                                                                  
        .sig                                                                                                                         
        .inputs                                                                                                                      
        .iter()                                                                                                                      
        .skip(1)                                                                                                                     
        .map(|arg| arg.to_token_stream().to_string())                                                                                
        .collect();                                                                                                                  
                                                                                                                                     
    let param: Vec<String> = args                                                                                                    
        .iter()                                                                                                                      
        .map(|arg| arg.split_whitespace().next().unwrap().to_string())                                                               
        .collect();                                                                                                                  
                                                                                                                                     
    #[rustfmt::skip]                                                                                                                 
    let fn_with = parse_str::<ItemFn>(concat_string!("                                                                               
        pub fn ", fn_name.replace("alter_", "with_"), "(mut self, ", args.join(", "), ") -> Self {                                   
            self.", fn_name, "(", param.join(", "), ");                                                                              
            self                                                                                                                     
        }                                                                                                                            
    ").as_str()).unwrap();                                                                                                           

On a signature like this:

    pub fn alter_value(&mut self, value: impl Into<String>) -> &mut Self {                                                           

the old pretty printer printed value :, the new one does value:, so the arg.split_whitespace().next() returns value: instead of value, which messes up the subsequent code.

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 6, 2023
@petrochenkov
Copy link
Contributor

I think we can follow the usual breaking change procedure here.

If the regressed crate is alive - send a fix.
Stuff like contains("foo :: bar") can be trivially replaced with contains("foo :: bar") || contains("foo::bar").

If the regressed crate is also popular (e.g. responsible for the largest number of regressions in the report), then special case it in the compiler.
E.g. do not remove spaces around :: if the neighboring tokens are foo and bar.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 7, 2023
@nnethercote
Copy link
Contributor Author

Interestingly, the joint-based pretty printer in #114571 doesn't break the five examples above. Presumably because the affected token streams come from proc macros, and #114571 doesn't make changes to how those are printed.

Now that I better understand the effects of these kinds of changes on real world code, I have the following plan:

  • Merge the first two commits from this PR in Clarify space_between #117698, because they're useful and make no functional changes.
  • Give up on improving space_between, or scale it back to just the changes that don't cause errors.
  • Return to the joint-based approach in Improve print_tts #114571.

@nnethercote
Copy link
Contributor Author

Now that I better understand the effects of these kinds of changes on real world code, I have the following plan:

  • Merge the first two commits from this PR in Clarify space_between #117698, because they're useful and make no functional changes.
  • Give up on improving space_between, or scale it back to just the changes that don't cause errors.
  • Return to the joint-based approach in Improve print_tts #114571.

These have all been done.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants