Skip to content

four-byte uncode characters confuse ' #28851

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
steveklabnik opened this issue Oct 5, 2015 · 7 comments
Closed

four-byte uncode characters confuse ' #28851

steveklabnik opened this issue Oct 5, 2015 · 7 comments
Labels
A-diagnostics Area: Messages for errors, warnings, and lints

Comments

@steveklabnik
Copy link
Member

This Rust program:

fn main() {
    let len = 'ஶ்ரீ'.len_utf8();
}

contains TAMIL SYLLABLE SHRII (śrī), aka U+0BB6 U+0BCD U+0BB0 U+0BC0. When trying to compile this program, I get this error:

2:20: 2:22 error: unterminated character constant: '.
2     let len = 'ஶ்ரீ'.len_utf8();

I know that it isn't a copy-paste issue, because I used vim's C-V u to type in the four code points manually.

@steveklabnik steveklabnik added the A-parser Area: The lexing & parsing of Rust source code to an AST label Oct 5, 2015
@arielb1
Copy link
Contributor

arielb1 commented Oct 5, 2015

Aren't char-s Unicode scalar values?

@steveklabnik
Copy link
Member Author

@arielb1 yes. am I doing something wrong here? I am bad at encodings, so this is likely.

@arielb1
Copy link
Contributor

arielb1 commented Oct 5, 2015

@steveklabnik

Unicode scalar value != character.

@steveklabnik
Copy link
Member Author

Yes, but it's four bytes, no?

@steveklabnik
Copy link
Member Author

Ahhh this isn't actually four bytes. Sigh. Thanks.

@steveklabnik
Copy link
Member Author

(basically, I thought that those four things were four bytes, but they're four codepoints themselves)

@steveklabnik steveklabnik reopened this Oct 5, 2015
@steveklabnik steveklabnik added A-diagnostics Area: Messages for errors, warnings, and lints and removed A-parser Area: The lexing & parsing of Rust source code to an AST labels Oct 5, 2015
@steveklabnik
Copy link
Member Author

Actually, I am re-opening, because this diagnostic message is really bad. It should say that you're putting something that's larger than a single USV into a char literal.

steveklabnik added a commit to steveklabnik/rust that referenced this issue Nov 5, 2015
If you try to put something that's bigger than a char into a char
literal, you get an error:

    fn main() {
        let c = 'ஶ்ரீ';
    }

    error: unterminated character constant:

This is a very compiler-centric message. Yes, it's technically
'unterminated', but that's not what you, the user did wrong.

Instead, this commit changes it to

    error: character literal may only contain one codepoint

As this actually tells you what went wrong.

Fixes rust-lang#28851
bors added a commit that referenced this issue Nov 5, 2015
If you try to put something that's bigger than a char into a char
literal, you get an error:

    fn main() {
        let c = 'ஶ்ரீ';
    }

    error: unterminated character constant:

This is a very compiler-centric message. Yes, it's technically
'unterminated', but that's not what you, the user did wrong.

Instead, this commit changes it to

    error: character literal that's larger than a char:

As this actually tells you what went wrong.

Fixes #28851
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints
Projects
None yet
Development

No branches or pull requests

2 participants