Skip to content

Emoji in label/lifetime recovered as character literal (rather than identifier) #108019

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
izik1 opened this issue Feb 14, 2023 · 3 comments · Fixed by #108031
Closed

Emoji in label/lifetime recovered as character literal (rather than identifier) #108019

izik1 opened this issue Feb 14, 2023 · 3 comments · Fixed by #108031
Assignees
Labels
A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@izik1
Copy link

izik1 commented Feb 14, 2023

Code

fn bar() {
    '🐱 loop {
        break
    }
}

Current output

error[E0762]: unterminated character literal
 --> src/lib.rs:2:5
  |
2 |     '🐱 loop {
  |     ^^^^^^^^^^

Desired output

error: identifiers cannot contain emoji
 --> src/lib.rs:2:5
  |
2 |     '🐱: loop {
  |      ^^

or something else similar to the one for

fn bar() {
    let 🐱 = ();
}
error: identifiers cannot contain emoji: `🐱`
 --> src/lib.rs:2:9
  |
2 |     let 🐱 = ();
  |         ^^

Perhaps with a =help "did you mean to use a character literal?" when applicable

Rationale and extra context

I feel the rationale is self-explanatory, however, if it ends up not being such, I can provide one upon request.

Other cases

small aside: I originally wrote this all for 🥺, but that is bizarrely not recognized in idents at all (it gives a error: unknown start of token: \u{1f97a}), and after realizing that some emotes are handled better, I decided to use to use 🐱. I specifically avoided 🦀 because it has extra-special handling ("Ferris cannot be used as an identifier")

Another case is, as mentioned prior, in lifetime names (as far as I'm aware, this is the same underlying cause: the emoji causes the token to be a character literal):

fn foo<'🐱>() -> &'🐱 () {
   &()
}

which gives 2 errors:

error: character literal may only contain one codepoint
 --> src/lib.rs:1:8
  |
1 | fn foo<'🐱>() -> &'🐱 () {
  |        ^^^^^^^^^^^^
  |
help: if you meant to write a `str` literal, use double quotes
  |
1 | fn foo<"🐱>() -> &"🐱 () {
  |        ~~~~~~~~~~~~

error: expected one of `#`, `>`, `const`, identifier, or lifetime, found `'🐱>() -> &'`
 --> src/lib.rs:1:8
  |
1 | fn foo<'🐱>() -> &'🐱 () {
  |        ^^^^^^^^^^^^ expected one of `#`, `>`, `const`, identifier, or lifetime

The following sample also has very different output (and probably closer to the expected output, although it's not without its own weirdness):

fn bar() {
    'a🐱: loop {}
}
error: malformed loop label
 --> src/lib.rs:6:7
  |
6 |     'a🐱: loop {}
  |       ^^ help: use the correct loop label format: `'🐱`

error: expected `while`, `for`, `loop` or `{` after a label
 --> src/lib.rs:6:7
  |
6 |     'a🐱: loop {}
  |       ^^ expected `while`, `for`, `loop` or `{` after a label
  |
help: consider removing the label
  |
6 -     'a🐱: loop {}
6 +     🐱: loop {}
  |

error: labeled expression must be followed by `:`
 --> src/lib.rs:6:7
  |
6 |     'a🐱: loop {}
  |     ---^^^^^^^^^^
  |     | |
  |     | help: add `:` after the label
  |     the label
  |
  = note: labels are used before loops and blocks, allowing e.g., `break 'label` to them

error: identifiers cannot contain emoji: `🐱`
 --> src/lib.rs:6:7
  |
6 |     'a🐱: loop {}
  |       ^^

warning: unused label
 --> src/lib.rs:6:7
  |
6 |     'a🐱: loop {}
  |       ^^
  |
  = note: `#[warn(unused_labels)]` on by default

warning: `playground` (lib) generated 1 warning
error: could not compile `playground` due to 4 previous errors; 1 warning emitted

Anything else?

No response

@izik1 izik1 added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 14, 2023
@jieyouxu
Copy link
Member

This looks like a fun one, would like to give it a try 🥺

@rustbot claim

@jieyouxu
Copy link
Member

For the '🐱 loop { case, it is incorrectly lexed as a character literal:

DEBUG rustc_parse::lexer next_token: Literal { kind: Char { terminated: false }, suffix_start: 12 }("'🐱 loop {")

@jieyouxu
Copy link
Member

Okay this is really weird,

[compiler/rustc_lexer/src/lib.rs:640] self.first() = '🥺'
[compiler/rustc_lexer/src/lib.rs:641] unic_emoji_char::is_emoji(self.first()) = false

[compiler/rustc_lexer/src/lib.rs:640] self.first() = '🐱'
[compiler/rustc_lexer/src/lib.rs:641] unic_emoji_char::is_emoji(self.first()) = true

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants