Skip to content

[ER] Line number for 'stream did not contain valid UTF-8' errors #76869

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
leonardo-m opened this issue Sep 18, 2020 · 6 comments · Fixed by #135557
Closed

[ER] Line number for 'stream did not contain valid UTF-8' errors #76869

leonardo-m opened this issue Sep 18, 2020 · 6 comments · Fixed by #135557
Labels
A-diagnostics Area: Messages for errors, warnings, and lints D-terse Diagnostics: An error or lint that doesn't give enough information about the problem at hand. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@leonardo-m
Copy link

If by mistake I write code as this:

fn main() {
    println!("Hello ’world");
}

All I get from rustc is:

error: couldn't read ...\test.rs: stream did not contain valid UTF-8

But I'd like to receive a line number where such invalidity happens.

Using: rustc 1.48.0-nightly f3c923a13 2020-09-17 x86_64-pc-windows-gnu
@tesuji
Copy link
Contributor

tesuji commented Sep 18, 2020

If rustc read source file with fs::read_to_string, that's maybe hard to report which line number contains invalid UTF-8.
Just my guess.

@jyn514 jyn514 added A-diagnostics Area: Messages for errors, warnings, and lints D-terse Diagnostics: An error or lint that doesn't give enough information about the problem at hand. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 18, 2020
@leonardo-m
Copy link
Author

To spot the problem I've used a Python script like:

for i, line in enumerate(open("mycode.rs")):
    if any(ord(c) >= 127 for c in line):
        print(i)

@SagnikHajra
Copy link

Also faced the same issue. I was using VS code.
>>> for i, line in enumerate(open("hello.rs")): ... print(line) ... ÿþfn main() {

No idea how did I end up with "ÿþ". Created a new file and it worked.

@Kevpopper
Copy link

I got same issue but i just changed .rs to .txt

@ms0503
Copy link

ms0503 commented Dec 24, 2024

"ÿþ" is U+00FF and U+00FE in Unicode.
These are byte order mark (BOM) in little-endian UTF-16.

If you write your source code in UTF-8 without BOM, it should work.

estebank added a commit to estebank/rust that referenced this issue Jan 15, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: `[193]` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
estebank added a commit to estebank/rust that referenced this issue Jan 15, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: `[193]` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
@estebank
Copy link
Contributor

#135557 will provide more information for these errors. For the case where the first byte of the "first" (entry) file is incorrect, we don't get a proper span to point at, so we customize the output to include the byte position in the message itself. It's not ideal, but it is better than what we had. I also encountered multiple places in our test suite that expect rust files to be valid utf-8 or crash 😄

estebank added a commit to estebank/rust that referenced this issue Jan 18, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: `[193]` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
estebank added a commit to estebank/rust that referenced this issue Jan 18, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: `[193]` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
estebank added a commit to estebank/rust that referenced this issue Jan 22, 2025
```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

CC rust-lang#76869, rust-lang#135557.
jieyouxu added a commit to jieyouxu/rust that referenced this issue Jan 22, 2025
Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 22, 2025
Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 22, 2025
Point at invalid utf-8 span on user's source code

```
error: couldn't read `$DIR/not-utf8-bin-file.rs`: stream did not contain valid UTF-8
  --> $DIR/not-utf8-2.rs:6:5
   |
LL |     include!("not-utf8-bin-file.rs");
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: byte `193` is not valid utf-8
  --> $DIR/not-utf8-bin-file.rs:2:14
   |
LL |     let _ = "�|�␂!5�cc␕␂��";
   |              ^
   = note: this error originates in the macro `include` (in Nightly builds, run with -Z macro-backtrace for more info)
```

When we attempt to load a Rust source code file, if there is a OS file failure we try reading the file as bytes. If that succeeds we try to turn it into UTF-8. If *that* fails, we provide additional context about *where* the file has the first invalid UTF-8 character.

Fix rust-lang#76869.
@bors bors closed this as completed in 57dd42d Jan 23, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints D-terse Diagnostics: An error or lint that doesn't give enough information about the problem at hand. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants