-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Improve print_tts
by making space_between
smarter
#117433
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
This PR changes punctuation jointness in many cases, something I advised against in the previous similar PR - #97340 (comment). After #114571 lands and after So, right now, I think, that would be equivalent to never using |
Update: when both |
bae356b
to
6e849e8
Compare
Ok, I have updated the code to never remove the space between adjacent punctuation tokens. The following cases are worse:
but overall it's not too bad, and still a lot better than the current output (though The existing I ended up merging all the previous commits that change the behaviour of |
|
||
/// Should two consecutive token trees be printed with a space between them? | ||
/// | ||
/// NOTE: should always be false if both token trees are punctuation, so that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this function should return anything in this case, it should rather have assert!(!is_punct(tt1) || !is_punct(tt2))
at the start instead.
The decision should be made based on Spacing
in that case, and we should never reach this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the Spacing
-based decision can be made inside this function, then it will be if is_punct(tt1) && is_punct(tt2) { ... }
at the start instead of the assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to avoid making any pretty-printing decisions based on Spacing
in this PR. We can leave those to #114571, which will change how space_between
is called. I plan to add the Spacing::Unknown
in that PR, for tokens coming from proc macros. Those will be the cases where space_between
is used.
With that decided, the current position of the assertion has the advantage that it's only checked in the case where space_between
returns false.
So I think this is good enough to merge, or do a crater run if you think that is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Crater run is needed in any case.
To avoid `!matches!(...)`, which is hard to think about. Instead every case now uses direct pattern matching and returns true or false. Also add a couple of cases to the `stringify.rs` test that currently print badly.
We currently do the wrong thing on a lot of these. The next commit will fix things.
As well as nicer output, this fixes several FIXMEs in `tests/ui/macros/stringify.rs`, where the current output is sub-optimal.
6e849e8
to
9b9f8f0
Compare
// NON-PUNCT + `;`: `x = 3;`, `[T; 3]` | ||
// NON-PUNCT + `.`: `x.y`, `tup.0` | ||
// NON-PUNCT + `:`: `'a: loop { ... }`, `x: u8`, `where T: U`, | ||
// `<Self as T>::x`, `Trait<'a>: Sized`, `X<Y<Z>>: Send`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These examples still involve punctuation and need an update.
@bors try |
Improve `print_tts` by making `space_between` smarter `space_between` currently handles a few cases that make the output nicer. It also gets some cases wrong. This PR fixes the wrong cases, and adds a bunch of extra cases, resulting in prettier output. E.g. these lines: ``` use smallvec :: SmallVec ; assert! (mem :: size_of :: < T > () != 0) ; ``` become these lines: ``` use smallvec::SmallVec; assert!(mem::size_of:: < T >() != 0); ``` This overlaps with rust-lang#114571, but this PR has the crucial characteristic of giving the same results for all token streams, including those generated by proc macros. For that reason I think it's worth having even if/when rust-lang#114571 is merged. It's also nice that this PR's improvements can be obtained by modifying only `space_between`. r? `@petrochenkov`
☀️ Try build successful - checks-actions |
@craterbot check |
👌 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
🚧 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
👌 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
🚧 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
🎉 Experiment
|
There are some legitimate regressions here, waiting on author to triage. |
I still don't understand how to read crater reports properly. But looking at the "regressed: dependencies" section, there are five problems, all involving conversions of a token stream to a string and then "parsing" the resulting string. Annoying stuff, and I'm not sure how to proceed. atspi-proxies-0.1.0It uses // TODO: this is sketchy as all hell
// it replaces all mentions of zbus::Result with the Generic std::result::Result, then, adds the Self::Error error type to the second part of the generic
// finally, it replaces all mentions of (String, zbus :: zvairnat :: OwnedObjectPath) with &Self.
// this menas that implementors will need to return a borrowed value of the same type to comply with the type system.
// unsure if this will hold up over time.
fn genericize_method_return_type(rt: &ReturnType) -> TokenStream {
let original = format!("{}", rt.to_token_stream());
let mut generic_result = original.replace("zbus :: Result", "std :: result :: Result");
let end_of_str = generic_result.len();
generic_result.insert_str(end_of_str - 2, ", Self :: Error");
let mut generic_impl = generic_result.replace(OBJECT_PAIR_NAME, "Self");
generic_impl.push_str(" where Self: Sized");
TokenStream::from_str(&generic_impl).expect("Could not genericize zbus method/property/signal. Attempted to turn \"{generic_result}\" into a TokenStream.")
} The spacing of awto-0.1.2It uses let db_type_is_text = ty.to_string().ends_with(":: Text");
if let Some(max_len) = &field.attrs.max_len {
if !db_type_is_text {
return Err(syn::Error::new(
max_len.span(),
"max_len can only be used on varchar & char types",
));
}
ty = quote!(#ty(Some(#max_len)));
} else if db_type_is_text {
ty = quote!(#ty(None));
} The change in spacing of ink-analyzer-ir-0.7.0
pub fn impl_from_ast(ast: &syn::DeriveInput) -> syn::Result<TokenStream> {
let name = &ast.ident;
if let Some(fields) = utils::parse_struct_fields(ast) {
if let Some(ast_field) = utils::find_field(fields, "ast") {
let ir_crate_path = utils::get_normalized_ir_crate_path();
let ast_field_type = &ast_field.ty;
let ast_type = if ast_field_type
.to_token_stream()
.to_string()
.starts_with("ast ::")
{
quote! { #ast_field_type }
} else {
quote! { ast::#ast_field_type }
}; The change of spacing from kcl-lib-0.1.35It uses let ret_ty = ast.sig.output.clone();
let ret_ty_string = ret_ty
.into_token_stream()
.to_string()
.replace("-> ", "")
.replace("Result < ", "")
.replace(", KclError >", "");
let return_type = if !ret_ty_string.is_empty() {
let ret_ty_string = if ret_ty_string.starts_with("Box <") {
ret_ty_string
.trim_start_matches("Box <")
.trim_end_matches('>')
.trim()
.to_string()
} else {
ret_ty_string.trim().to_string()
}; An example return type changes from this: -> Result < Box < ExtrudeGroup >, KclError > {} to this: -> Result < Box < ExtrudeGroup > , KclError > {} And the space inserted between the pagetop-0.0.46This uses let args: Vec<String> = fn_item
.sig
.inputs
.iter()
.skip(1)
.map(|arg| arg.to_token_stream().to_string())
.collect();
let param: Vec<String> = args
.iter()
.map(|arg| arg.split_whitespace().next().unwrap().to_string())
.collect();
#[rustfmt::skip]
let fn_with = parse_str::<ItemFn>(concat_string!("
pub fn ", fn_name.replace("alter_", "with_"), "(mut self, ", args.join(", "), ") -> Self {
self.", fn_name, "(", param.join(", "), ");
self
}
").as_str()).unwrap(); On a signature like this:
the old pretty printer printed |
I think we can follow the usual breaking change procedure here. If the regressed crate is alive - send a fix. If the regressed crate is also popular (e.g. responsible for the largest number of regressions in the report), then special case it in the compiler. @rustbot author |
Interestingly, the joint-based pretty printer in #114571 doesn't break the five examples above. Presumably because the affected token streams come from proc macros, and #114571 doesn't make changes to how those are printed. Now that I better understand the effects of these kinds of changes on real world code, I have the following plan:
|
These have all been done. |
space_between
currently handles a few cases that make the output nicer. It also gets some cases wrong. This PR fixes the wrong cases, and adds a bunch of extra cases, resulting in prettier output. E.g. these lines:become these lines:
This overlaps with #114571, but this PR has the crucial characteristic of giving the same results for all token streams, including those generated by proc macros. For that reason I think it's worth having even if/when #114571 is merged. It's also nice that this PR's improvements can be obtained by modifying only
space_between
.r? @petrochenkov