Skip to content

Underscore suffixes allowed in literals #41723

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
qnighy opened this issue May 3, 2017 · 10 comments
Closed

Underscore suffixes allowed in literals #41723

qnighy opened this issue May 3, 2017 · 10 comments
Labels
A-grammar Area: The grammar of Rust A-parser Area: The lexing & parsing of Rust source code to an AST T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@qnighy
Copy link
Contributor

qnighy commented May 3, 2017

I found these pathlogical examples pass the lexer and the parser:

fn main() {
    println!("{}", "Foo"_);
    println!("{}", 10._);
}

The reason is that

  1. scan_optional_raw_name eats _, but reports no suffix.
  2. scan_number distinguishes between decimal points and field/method syntax by is_xid_start, which in fact excludes _.

According to the reference, it seems these examples shouldn't be allowed.

Tested on stable (rustc 1.17.0 (56124baa9 2017-04-24)) and nightly (rustc 1.19.0-nightly (6a5fc9eec 2017-05-02)) on the playground.

@qnighy
Copy link
Contributor Author

qnighy commented May 3, 2017

According to the reference, decimal points are optionally followed by another "decimal literal". Since _ isn't a decimal literal, it shouldn't be allowed.

Here the parser is recognizing _ in 10._ as a suffix rather than a decimal literal followed by the point. If not, 10._f64 would be lexed as 10._+f64, but actually the parser complains _f64 is not a valid suffix.

In addition, the reference says f32 and f64 are the only valid suffixes. So _ shouldn't be allowed as a suffix either.

@nagisa nagisa added A-grammar Area: The grammar of Rust A-parser Area: The lexing & parsing of Rust source code to an AST I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 3, 2017
@nikomatsakis
Copy link
Contributor

It seems to me that _ should be considered part of the number. So I would expect 10._f64 to work, indeed, just as 10.2_f64 or 2_f64 works.

@arielb1 points out that "a"_ is also accepted, which seems like a bug.

Basically, I do not expect a suffix to begin with _, I think, but I do expect _ to be allowed in numeric literals anywhere (including the end).

@arielb1 arielb1 added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label May 11, 2017
@nikomatsakis nikomatsakis removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label May 11, 2017
@nikomatsakis
Copy link
Contributor

Shifting nomination to T-lang for now.

@joshtriplett
Copy link
Member

The language team discussed this in today's meeting. We also considered cases like 42.method_on_numbers(), or 42._method_on_numbers(), both of which ought to be parsed as methods. The conclusion we came to was that it's ambiguous (for our mental lexers if nothing else) to allow a trailing underscore on a floating-point literal immediately following a . with no subsequent digits. Parsing _[a-z] in a way that puts the _ and the [a-z] in separate tokens feels incredibly confusing. It's not unreasonable to have to write 42.0f64 or 42.0_f64 instead if you want a suffix.

(By contrast, 42_f64 or 42_u64 seems fine, and unambiguous, though whether you should write that might be a good question for the style team.)

So, the recommendation from the language team would be to always parse 42._ident as a field or method, and never include the _ as part of the floating-point literal.

One question that didn't come up in the language team meeting: does anything rely on the current (apparently inconsistent) lexing behavior here? Would fixing this ambiguity break anything?

@qnighy
Copy link
Contributor Author

qnighy commented May 12, 2017

I'll try to write a patch for this issue.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented May 12, 2017

@joshtriplett

One question that didn't come up in the language team meeting: does anything rely on the current (apparently inconsistent) lexing behavior here? Would fixing this ambiguity break anything?

I would think not, at least for numbers, given that it results in an error. Not sure about ""_, which... apparently parses but does not error? (When do we ever support suffixes on strings, besides #?)

frewsxcv added a commit to frewsxcv/rust that referenced this issue May 13, 2017
…loat, r=petrochenkov

Disallow ._ in float literal.

This patch makes lexer stop parsing number literals before `._`, as well as before `.a`. Underscore itself is still allowed like in `4_000_000.000_000_`.

Fixes a half part of rust-lang#41723. The other is `""_`.
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue May 13, 2017
…loat, r=petrochenkov

Disallow ._ in float literal.

This patch makes lexer stop parsing number literals before `._`, as well as before `.a`. Underscore itself is still allowed like in `4_000_000.000_000_`.

Fixes a half part of rust-lang#41723. The other is `""_`.
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue May 14, 2017
…loat, r=petrochenkov

Disallow ._ in float literal.

This patch makes lexer stop parsing number literals before `._`, as well as before `.a`. Underscore itself is still allowed like in `4_000_000.000_000_`.

Fixes a half part of rust-lang#41723. The other is `""_`.
@qnighy
Copy link
Contributor Author

qnighy commented May 15, 2017

@joshtriplett

So, the recommendation from the language team would be to always parse 42._ident as a field or method, and never include the _ as part of the floating-point literal.

Do you mean by "never include" also forbidding literals like 4_000.000_000 ?

@joshtriplett
Copy link
Member

joshtriplett commented May 16, 2017

@qnighy No, that's fine. I meant specifically that 42._ident or similar should never lex as a floating-point number 42._ with suffix ident; an underscore immediately after the dot should not lex as part of the literal.

@qnighy
Copy link
Contributor Author

qnighy commented May 16, 2017

@joshtriplett I see. Then #41946 and #41990 will close this issue.

bors added a commit that referenced this issue Jun 6, 2017
…ike-literals, r=nikomatsakis

Disallow underscore suffix for string-like literals.

This patch turns string/bytestring/char/byte literals followed by an underscore, like `"Foo"_`, to an error.

`scan_optional_raw_name` will parse `_` as a valid raw name, but it will be rejected by the parser. I also considered just stopping parsing when the suffix is `_`, but in that case `"Foo"_` will be lexed as two valid tokens.

Fixes the latter half of #41723.
@qnighy
Copy link
Contributor Author

qnighy commented Jun 6, 2017

Closing since #41946 and #41990 are merged.

@qnighy qnighy closed this as completed Jun 6, 2017
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-grammar Area: The grammar of Rust A-parser Area: The lexing & parsing of Rust source code to an AST T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants