Skip to content

Incorrect and inconsistent jointness of tokens in desugared doc comment #49596

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
dtolnay opened this issue Apr 2, 2018 · 9 comments · Fixed by #49597
Closed

Incorrect and inconsistent jointness of tokens in desugared doc comment #49596

dtolnay opened this issue Apr 2, 2018 · 9 comments · Fixed by #49597
Assignees
Labels
A-decl-macros-2-0 Area: Declarative macros 2.0 (#39412)

Comments

@dtolnay
Copy link
Member

dtolnay commented Apr 2, 2018

I have not minimized this yet but in TeXitoi/structopt#88 we are seeing inexplicable behavior when iterating over tokens of a struct field doc comment. Related to #49545 so mentioning @alexcrichton.

One of their test cases contains the following struct.

/// Lorem ipsum
#[derive(StructOpt, PartialEq, Debug)]
struct LoremIpsum {
    /// Fooify a bar
    /// and a baz
    #[structopt(short = "f", long = "foo")]
    foo: bool,
}

Within their macro implementation we are seeing the desugared doc comment of /// Fooify a bar having an Alone spacing:

Op(Op { op: '=', spacing: Alone, span: Span(Span { lo: BytePos(0), hi: BytePos(0), ctxt: #0 }) })
Literal(Literal(Literal(Str_(/// Fooify a bar), None)))

while the desugared /// and a baz has a Joint spacing. I believe the Joint is incorrect because only an Op followed by another Op should be able to have Joint spacing.

Op(Op { op: '=', spacing: Joint, span: Span(Span { lo: BytePos(0), hi: BytePos(0), ctxt: #0 }) })
Literal(Literal(Literal(Str_(/// and a baz), None)))

I stuck the following loop at the top of their derive entry point:

    for tt in input.clone() {
        println!("{:#?}", tt);
    }

and it indicates that the first doc comment has kind: Tree while the second has kind: JointTree.

                        TokenStream {
                            kind: Tree(
                                Token(
                                    Span {
                                        lo: BytePos(
                                            0
                                        ),
                                        hi: BytePos(
                                            0
                                        ),
                                        ctxt: #0
                                    },
                                    DocComment(
                                        /// Fooify a bar
                                    )
                                )
                            )
                        },
                        TokenStream {
                            kind: JointTree(
                                Token(
                                    Span {
                                        lo: BytePos(
                                            0
                                        ),
                                        hi: BytePos(
                                            0
                                        ),
                                        ctxt: #0
                                    },
                                    DocComment(
                                        /// and a baz
                                    )
                                )
                            )
                        },
@alexcrichton
Copy link
Member

@dtolnay I think I know what's happening here in terms of where the Joint is coming from but I'm also a little confused as to where TeXitoi/structopt#88 is happening.

I was unable to get JointTree or Joint to show up though unfortunately in a smaller hello-world test with just adding some doc comments.

I'll see what I can about the spurious Joint at least though

@TeXitoi
Copy link
Contributor

TeXitoi commented Apr 2, 2018

The bug has nothing related to the PR, that's just the first time the error is reported. I can reproduce the failed assertion on master.

But there is a description of what happen in the PR. Basically, there is a 2 line doc comment, but only the first one is processed.

@TeXitoi
Copy link
Contributor

TeXitoi commented Apr 2, 2018

@dtolnay
Copy link
Member Author

dtolnay commented Apr 2, 2018

Yeah I couldn't reproduce this in a straightforward minimized macro either, but checking out the structopt PR at commit 5000840d1a6b79cf704caa927b4f5b15d29978c9 and cargo test --features nightly does reproduce it. I am curious what you find!

Their problem is here + here where a Joint doc = "..." is not considered a legal Meta.

@alexcrichton
Copy link
Member

Ok thanks for the info! I've confirmed that this is fixed by #49597 where I jiggered things around a bit. Namely it appears that the Alone spacing is indeed what we need to fix this.

@dtolnay
Copy link
Member Author

dtolnay commented Apr 2, 2018

Repro script

#!/bin/sh

cargo new --lib structopt_derive
cargo new --lib repro

echo >structopt_derive/src/lib.rs '
#![feature(proc_macro)]

extern crate proc_macro;
use proc_macro::{TokenStream, TokenTreeIter, TokenTree, TokenNode, Spacing};

#[proc_macro_derive(StructOpt)]
pub fn derive_structopt(input: TokenStream) -> TokenStream {
    let mut iter = input.into_iter();
    assert_eq!(iter.next().unwrap().to_string(), "struct");
    assert_eq!(iter.next().unwrap().to_string(), "S");
    let mut inner = unwrap_group(iter.next().unwrap());

    assert_eq!(inner.next().unwrap().to_string(), "#");
    let mut x = unwrap_group(inner.next().unwrap());
    assert_eq!(x.next().unwrap().to_string(), "doc");
    println!("{:?} {}", unwrap_spacing(x.next().unwrap()), x.next().unwrap());

    assert_eq!(inner.next().unwrap().to_string(), "#");
    let mut y = unwrap_group(inner.next().unwrap());
    assert_eq!(y.next().unwrap().to_string(), "doc");
    println!("{:?} {}", unwrap_spacing(y.next().unwrap()), y.next().unwrap());

    TokenStream::empty()
}

fn unwrap_group(tt: TokenTree) -> TokenTreeIter {
    match tt.kind {
        TokenNode::Group(_, s) => s.into_iter(),
        _ => unimplemented!(),
    }
}

fn unwrap_spacing(tt: TokenTree) -> Spacing {
    match tt.kind {
        TokenNode::Op(_, s) => s,
        _ => unimplemented!(),
    }
}
'

echo >>structopt_derive/Cargo.toml '
[lib]
proc-macro = true
'

echo >repro/src/lib.rs '
#![allow(dead_code)]

#[macro_use]
extern crate structopt_derive;

fn f() {
    #[derive(StructOpt)]
    struct S {
        /// X
        /// Y
        #[doc(hidden)]
        foo: bool,
    }
}
'

echo >>repro/Cargo.toml '
structopt_derive = { path = "../structopt_derive" }
'

cargo build --manifest-path repro/Cargo.toml

Output

Alone "/// X"
Joint "/// Y"

@dtolnay
Copy link
Member Author

dtolnay commented Apr 2, 2018

Interestingly in the script if you move struct S outside of the fn f, then the output is correct.

Alone "/// X"
Alone "/// Y"

@alexcrichton when you say "jiggered things around a bit" is that of the intentional sort or the accidentally-fixed-it sort? It would be good to understand why the token stream was different depending on whether the macro is invoked inside a function or outside -- not clear in your PR what might have fixed that.

@alexcrichton
Copy link
Member

@dtolnay sure yeah, worth documenting!

So I'm not really sure why, but the JointTree vs Tree is what's going on here. For whatever reason rustc seems to be parsing a JointTree inside a function and a Tree outside, I'm not really sure why. Assuming that though we know that the is_joint is true for the second macro. The is_joint is then later used for spacing in Op tokens, and in the op! macro you'll see how the last token uses the op_kind and all previous ones are Spacing::Joint.

Unfortunately I fat-fingered this a bit and in the doc comment expansion the code also uses op! as a sort of shortcut to TokenTree::Op. This is the bug, however, as the tokens are inheriting the is_joint variable and Joint spacing accidentally. The internal tokens don't have the same spacing, and we don't actually have any ability to communicate a Joint doc comment so that information needs to get lost (instead of preserved at a sort of random location).

In the PR I made these are both explicitly changed to Spacing::Alone as they're supposed to be (as the operators aren't actually joint with anything).

@dtolnay
Copy link
Member Author

dtolnay commented Apr 2, 2018

For whatever reason rustc seems to be parsing a JointTree inside a function and a Tree outside, I'm not really sure why.

This is really what I was interested in. I filed #49604 to follow up. Thanks!

alexcrichton added a commit to alexcrichton/rust that referenced this issue Apr 5, 2018
…chenkov

proc_macro: Reorganize public API

This commit is a reorganization of the `proc_macro` crate's public user-facing
API. This is the result of a number of discussions at the recent Rust All-Hands
where we're hoping to get the `proc_macro` crate into ship shape for
stabilization of a subset of its functionality in the Rust 2018 release.

The reorganization here is motivated by experiences from the `proc-macro2`,
`quote`, and `syn` crates on crates.io (and other crates which depend on them).
The main focus is future flexibility along with making a few more operations
consistent and/or fixing bugs. A summary of the changes made from today's
`proc_macro` API is:

* The `TokenNode` enum has been removed and the public fields of `TokenTree`
  have also been removed. Instead the `TokenTree` type is now a public enum
  (what `TokenNode` was) and each variant is an opaque struct which internally
  contains `Span` information. This makes the various tokens a bit more
  consistent, require fewer wrappers, and otherwise provides good
  future-compatibility as opaque structs are easy to modify later on.

* `Literal` integer constructors have been expanded to be unambiguous as to what
  they're doing and also allow for more future flexibility. Previously
  constructors like `Literal::float` and `Literal::integer` were used to create
  unsuffixed literals and the concrete methods like `Literal::i32` would create
  a suffixed token. This wasn't immediately clear to all users (the
  suffixed/unsuffixed aspect) and having *one* constructor for unsuffixed
  literals required us to pick a largest type which may not always be true. To
  fix these issues all constructors are now of the form
  `Literal::i32_unsuffixed` or `Literal::i32_suffixed` (for all integral types).
  This should allow future compatibility as well as being immediately clear
  what's suffixed and what isn't.

* Each variant of `TokenTree` internally contains a `Span` which can also be
  configured via `set_span`. For example `Literal` and `Term` now both
  internally contain a `Span` rather than having it stored in an auxiliary
  location.

* Constructors of all tokens are called `new` now (aka `Term::intern` is gone)
  and most do not take spans. Manufactured tokens typically don't have a fresh
  span to go with them and the span is purely used for error-reporting
  **except** the span for `Term`, which currently affects hygiene. The default
  spans for all these constructed tokens is `Span::call_site()` for now.

  The `Term` type's constructor explicitly requires passing in a `Span` to
  provide future-proofing against possible hygiene changes. It's intended that a
  first pass of stabilization will likely only stabilize `Span::call_site()`
  which is an explicit opt-in for "I would like no hygiene here please". The
  intention here is to make this explicit in procedural macros to be
  forwards-compatible with a hygiene-specifying solution.

* Some of the conversions for `TokenStream` have been simplified a little.

* The `TokenTreeIter` iterator was renamed to `token_stream::IntoIter`.

Overall the hope is that this is the "final pass" at the API of `TokenStream`
and most of `TokenTree` before stabilization. Explicitly left out here is any
changes to `Span`'s API which will likely need to be re-evaluated before
stabilization.

All changes in this PR have already been reflected to the [`proc-macro2`],
`quote`, and `syn` crates. New versions of all these crates have also been
published to crates.io.

Once this lands in nightly I plan on making an internals post again summarizing
the changes made here and also calling on all macro authors to give the APIs a
spin and see how they work. Hopefully pending no major issues we can then have
an FCP to stabilize later this cycle!

[`proc-macro2`]: https://docs.rs/proc-macro2/0.3.1/proc_macro2/

Closes rust-lang#49596
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-decl-macros-2-0 Area: Declarative macros 2.0 (#39412)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants