-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Debug printing of combining characters is wrong #41922
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
cc @tbu- who made the change to debug printing and @alexcrichton who approved it |
Python seems to do the same thing.
|
@clarcharr That is, do you know some implementation we could copy? |
@tbu- not that I can think of; the current way seems wrong, though. Perhaps we could just check if a character is within the combining character range? I found this and it probably could help: http://stackoverflow.com/a/17052803 Perhaps we could make a similar script? |
Also got some help on Twitter for this: |
I would not consider this a bug, as it's common behavior to not touch or change Unicode characters when printed out to stdout or a file, specially when it's for debug mode. One reason to not do this is the fact that it can easily mislead the user. Let's say I got the output and copy-pasted the output in a Unicode decoder, to see what character we have in the spot. I will see two codepoints in the decoder, one of which had not existed in the original string. So, IMHO, there are pros in doing so, specially nicer-looking output, but the main con being the Debug output not telling you the truth, which is very unfortunate, specially since there will be almost no work around it! I think it's better to keep these fancy features for the high-level parts of a stack, like If Rust wants to do anything special about these characters, the filter would be That said, I think we also need to take a look at what other modern Unicode-savvy languages, like Swift, are doing in this area, before making a decision. |
For reference, in Swift: let str = "e\u{301}";
// Array of unicode scalars, equivalent to Rust's chars
print("\(Array(str.unicodeScalars))"); // ["e", "\u{0301}"]
// Array of unicode scalars converted into strings
print("\(Array(str.unicodeScalars).map({ String.init($0) }))"); // ["e", "́"] Swift opts to print code points for unicode scalars (but when converted to strings they display as in Rust). |
This seems to have been deliberately changed to the current output as a result of #24588. |
I still just think that checking if the character is combining and then escaping if it's by itself is the best option. |
Oh, I see: you already mentioned the earlier change! I agree: this would make sense for combining characters. The range described on Wikipedia should probably be sufficient? |
Escape combining characters in char::Debug Although combining characters are technically printable, they make little sense to print on their own with `Debug`: it'd be better to escape them like non-printable characters. This is a breaking change, but I imagine the fact `escape_debug` is rare and almost certainly primarily used for debugging that this is an acceptable change. Resolves #41922. r? @alexcrichton cc @clarcharr
Uh oh!
There was an error while loading. Please reload this page.
Minimal example:
(playground link)
Expected output is either:
Or:
Actual output:
Note that the combining accent prints over the single quote. This is confusing and shouldn't happen.
The text was updated successfully, but these errors were encountered: