-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[WIP] Token-based outer attributes handling #76130
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
In PR rust-lang#76130, I add a fourth field, which makes using a tuple variant somewhat unwieldy.
This comment has been minimized.
This comment has been minimized.
…nkov Factor out StmtKind::MacCall fields into `MacCallStmt` struct In PR rust-lang#76130, I add a fourth field, which makes using a tuple variant somewhat unwieldy.
602b066
to
083792b
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 083792bcf67d1e50afc9ad50500ab011d34b0bee with merge 30ddab741f8c02392c4383445436138f5a6ec53d... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 30ddab741f8c02392c4383445436138f5a6ec53d with parent 80cacd7, future comparison URL. |
Finished benchmarking try commit (30ddab741f8c02392c4383445436138f5a6ec53d): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
083792b
to
23fd9fb
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 23fd9fbd9d07096ca254592013a42b4cae12ba29 with merge b3535fba76a3fc7202284b7699afff8f8831e461... |
☀️ Try build successful - checks-actions, checks-azure |
Queued b3535fba76a3fc7202284b7699afff8f8831e461 with parent 42d896a, future comparison URL. |
Finished benchmarking try commit (b3535fba76a3fc7202284b7699afff8f8831e461): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
@Aaron1011 |
Fixes rust-lang#74616 Makes progress towards rust-lang#43081 Unblocks PR rust-lang#76130 When pretty-printing an AST node, we may insert additional parenthesis to ensure that precedence is properly preserved in code we output. However, the proc macro implementation relies on comparing a pretty-printed AST node to the captured `TokenStream`. Inserting extra parenthesis changes the structure of the reparsed `TokenStream`, making the comparison fail. This PR refactors the AST pretty-printing code to allow skipping the insertion of additional parenthesis. Several freestanding methods are moved to trait methods on `PrintState`, which keep track of an internal `insert_extra_parens` flag. This flag is normally `true`, but we expose a public method which allows pretty-printing a nonterminal with `insert_extra_parens = false`. To avoid changing the public interface of `rustc_ast_pretty`, the freestanding `_to_string` methods are changed to delegate to a newly-crated `State`. The main pretty-printing code is moved to a new `state` module to ensure that it does not accidentally call any of these public helper functions (instead, the internal functions with the same name should be used).
…ochenkov Refactor AST pretty-printing to allow skipping insertion of extra parens Fixes rust-lang#75734 Makes progress towards rust-lang#43081 Unblocks PR rust-lang#76130 When pretty-printing an AST node, we may insert additional parenthesis to ensure that precedence is properly preserved in code we output. However, the proc macro implementation relies on comparing a pretty-printed AST node to the captured `TokenStream`. Inserting extra parenthesis changes the structure of the reparsed `TokenStream`, making the comparison fail. This PR refactors the AST pretty-printing code to allow skipping the insertion of additional parenthesis. Several freestanding methods are moved to trait methods on `PrintState`, which keep track of an internal `insert_extra_parens` flag. This flag is normally `true`, but we expose a public method which allows pretty-printing a nonterminal with `insert_extra_parens = false`. To avoid changing the public interface of `rustc_ast_pretty`, the freestanding `_to_string` methods are changed to delegate to a newly-crated `State`. The main pretty-printing code is moved to a new `state` module to ensure that it does not accidentally call any of these public helper functions (instead, the internal functions with the same name should be used).
Blocked on #77250 |
Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR rust-lang#76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see rust-lang#77250 (comment)). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
…on, r=petrochenkov Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR rust-lang#76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see rust-lang#77250 (comment)). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
Triage: A reminder that #77250 has been merged, so it seems maybe this can continue... |
This is currently blocked on further work on statement attributes |
Closing in favor of #80689 |
Makes progress towards #43081
We now capture tokens for attributes, and remove them from the tokenstream when applying
#[cfg]
attributes (in addition to modifying the AST). As a result, derive-proc-macros now receive the exact input (instead of the pretty-printed/retokenized input), even when fields/variants get#[cfg]
-stripped.Several changes are made in order to accomplish this:
PreexpTokenStream
andPreexpTokenTree
are added. These are identical toTokenStream
andPreexpTokenstream,
with the exception of anOuterAttributes
variant inPreexpTokenTree
. This is used to represent captured attributes, allowing the target tokens to be removed by#[cfg]-stripping
(and when invoking attribute macros).
Attribute
. This allows us to removeprepend_attributes
, which required pretty-printing/retokenizing attributes in certain cases.collect_tokens
was rewritten. The implementation is now much simpler - instead of keeping track of nestedTokenTree::Delimited
at various deps, we collect all tokens into a flat buffer (e.g. we pushTokenKind::OpenDelim
andTokenKind::CloseDelim
. After capturing, we losslessly re-package these tokens back into the normalTokenTree::Delimited
structure.PreexpTokenStream
on AST s structs instead of a plainTokenStream
. This contains both the attributes and the target itself.parse_outer_attributes
now passes the parsed attributes to a closure, instead of simply returning them. The closure is responsible for parsing the attribute target, and allows us to automatically construct the properPreexpTokenStream
.HasAttrs
now has avisit_tokens
method. This is used during#[cfg]
-stripping to allow us to remove attribute targets from thePreexpTokenStream
.This PR is quite large (though ~1000 lines are tests). I opened it mainly to show how all of the individual pieces fit together, since some of them (e.g. the
parse_outer_attributes
change) don't make sense if we're not going to land this PR. However, many pieces are mergeable on their own - I plan to split them into their own PRs.TODO:
#[cfg]
/#[cfg_attr]
are duplicated. This is because we run the same code for both the AST-based attributes and the token-based attributes. We could possibly skip validating the token-based attributes, but I was worried about accidentally skipping gating if the pretty-print/reparse check ever fails. These are deduplicated by the diagnostic infrastructure outside of compiletest, but it would be nice to fix this.PreexpTokenStream
alongside outer attributes, which allows us to handle#![cfg(FALSE)]
and remove its parent. This should be good enough to handle all stable uses of inner attributes that are observable by proc-macros (currently, just cfg-stripping before#[derive]
macros are invoked). Custom inner attributes are not handled.Parser::parse_stmt_without_recovery
does not eat a trailing semicolon, which means we will not capture it in the tokenstream. I need to either refactor statement parsing to eat the semicolon inside theparse_outer_attributes_with_tokens
, or manually adjust the captured tokens.cfg
,cfg_attr
,derive
, or any non-builtin attribute). There are many scenarios where we can skip token collection for some/all ast structs (noderive
on an an item, nocfg
inside an item, no custom attributes on an item, etc.)