-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Recover error strings on Unix from_lossy_utf8 #99609
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
Some language settings can result in unreliable UTF-8 being produced. This can result in failing to emit the error string, panicking instead. from_lossy_utf8 allows us to assume these strings usually will be fine.
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
(rust-highfive has picked a reviewer for you, use r? to override) |
Aiui some legacy encodings such as shift-jis aren't an ascii-superset. So if there's a system using such a locale they'd still garbage and this wouldn't really be a fix. |
Alas, "Not output garbage" is not the goal of this PR. A more complicated fix requires more nuanced handling, you are correct, but immediately after writing this, I started doing many more smaller changes to improve things further until I realized the The primary location to find Shift-JIS is in fact on Windows platforms, which have an even more eccentric interpretation of it than you might imagine. I am quite familiar with how Shift-JIS breaks systems, as it is what gives the phenomenon of incorrect character emission the name mojibake. |
I'm not understanding the relevance of Windows here? In Rust for Windows, In Windows itself all local encodings are lossily converted to/from UTF-16. I guess the only issue would be reading the stdout of a third party tool that spits out a locale-specific encoding. |
@ChrisDenton The only relevance is that to Actually Fix the Issue so that we don't ever emit garbage would require changing the signature of And I may or may not get around to doing all that. However, here the standard library is panicking needlessly while trying to construct a mere error report. Emitting junk data is genuinely preferable. "Really fixing it" can be left to future work. |
To be clear: Before I ever touched programming, I did localization. If you have questions about fonts, locales, languages, or encodings, I will happily answer them. But I would rather save any such explanations to the comments and commit messages of my intended future PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@workingjubilee asked me to take a look at this, and I think this makes sense given the justification in #99609 (comment), that this is just to avoid panicking and producing a good error message on a best-effort basis.
That said I won't snipe the review from the assigned reviewer, but r=me if they don't want to take it.
I'll also try to be clear. I do not have a problem with this PR at all. I think it's a good improvement over the current behavior. I would however be much more nervous about a more substantial fix that potentially also affects Windows as the original issue demonstrated a *nix specific problem. But I agree, this conversation can be better had at a more appropriate time. |
@bors r+ |
Sorry to have missed this. LGTM as well. Thanks @thomcc. |
… r=thomcc Recover error strings on Unix from_lossy_utf8 Some language settings can result in unreliable UTF-8 being produced. This can result in failing to emit the error string, panicking instead. from_lossy_utf8 allows us to assume these strings usually will be fine. This fixes rust-lang#99535.
☀️ Test successful - checks-actions |
Finished benchmarking commit (8e9c93d): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
Some language settings can result in unreliable UTF-8 being produced.
This can result in failing to emit the error string, panicking instead.
from_lossy_utf8 allows us to assume these strings usually will be fine.
This fixes #99535.