Debug printing of combining characters is wrong #41922

clarfonthey · 2017-05-11T21:07:48Z

Minimal example:

fn main() {
    let s = "e\u{301}";
    println!("str: {:?}", s);
    println!("bytes: {:?}", s.chars().collect::<Vec<_>>());
}

(playground link)

Expected output is either:

str: "é"
bytes: ['e', '\u{301}']

Or:

str: "é"
bytes: ['e', '◌́']

Actual output:

str: "é"
bytes: ['e', '́']

Note that the combining accent prints over the single quote. This is confusing and shouldn't happen.

The text was updated successfully, but these errors were encountered:

clarfonthey · 2017-05-11T21:12:31Z

cc @tbu- who made the change to debug printing and @alexcrichton who approved it

tbu- · 2017-05-11T21:57:32Z

Python seems to do the same thing.

>>> '\u0301'
'́'

tbu- · 2017-05-11T21:59:21Z

@clarcharr That is, do you know some implementation we could copy?

clarfonthey · 2017-05-11T22:21:54Z

@tbu- not that I can think of; the current way seems wrong, though. Perhaps we could just check if a character is within the combining character range?

I found this and it probably could help: http://stackoverflow.com/a/17052803

Perhaps we could make a similar script?

clarfonthey · 2017-05-11T22:40:25Z

Also got some help on Twitter for this:

https://twitter.com/FakeUnicode/status/862798986238873601

behnam · 2017-08-11T20:49:18Z

I would not consider this a bug, as it's common behavior to not touch or change Unicode characters when printed out to stdout or a file, specially when it's for debug mode.

One reason to not do this is the fact that it can easily mislead the user. Let's say I got the output and copy-pasted the output in a Unicode decoder, to see what character we have in the spot. I will see two codepoints in the decoder, one of which had not existed in the original string.

So, IMHO, there are pros in doing so, specially nicer-looking output, but the main con being the Debug output not telling you the truth, which is very unfortunate, specially since there will be almost no work around it! I think it's better to keep these fancy features for the high-level parts of a stack, like Display, instead of Debug.

If Rust wants to do anything special about these characters, the filter would be GC=Mn (Nonspacing_Mark). But, it should be noted that this would mean the result would depend on the Unicode version of the compiler, and newly assigned characters won't get the special treatment until the internal Unicode data of Rust gets updated.

That said, I think we also need to take a look at what other modern Unicode-savvy languages, like Swift, are doing in this area, before making a decision.

varkor · 2018-03-22T12:42:46Z

For reference, in Swift:

let str = "e\u{301}";
// Array of unicode scalars, equivalent to Rust's chars
print("\(Array(str.unicodeScalars))"); // ["e", "\u{0301}"]
// Array of unicode scalars converted into strings
print("\(Array(str.unicodeScalars).map({ String.init($0) }))"); // ["e", "́"]

Swift opts to print code points for unicode scalars (but when converted to strings they display as in Rust).
This seems like reasonable behaviour (@clarcharr's first suggestion).

varkor · 2018-03-22T13:01:25Z

This seems to have been deliberately changed to the current output as a result of #24588.

clarfonthey · 2018-03-22T16:17:19Z

I still just think that checking if the character is combining and then escaping if it's by itself is the best option.

varkor · 2018-03-22T16:44:43Z

Oh, I see: you already mentioned the earlier change! I agree: this would make sense for combining characters. The range described on Wikipedia should probably be sufficient?

@alexcrichton

Escape combining characters in char::Debug Although combining characters are technically printable, they make little sense to print on their own with `Debug`: it'd be better to escape them like non-printable characters. This is a breaking change, but I imagine the fact `escape_debug` is rare and almost certainly primarily used for debugging that this is an acceptable change. Resolves #41922. r? @alexcrichton cc @clarcharr

Mark-Simulacrum added A-Unicode Area: Unicode T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 22, 2017

Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 27, 2017

varkor mentioned this issue Mar 22, 2018

Escape combining characters in char::Debug #49283

Merged

bors closed this as completed in #49283 May 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Debug printing of combining characters is wrong #41922

Debug printing of combining characters is wrong #41922

clarfonthey commented May 11, 2017 •

edited

Loading

clarfonthey commented May 11, 2017 •

edited

Loading

Uh oh!

tbu- commented May 11, 2017

Uh oh!

tbu- commented May 11, 2017

Uh oh!

clarfonthey commented May 11, 2017 •

edited

Loading

Uh oh!

clarfonthey commented May 11, 2017

Uh oh!

behnam commented Aug 11, 2017

Uh oh!

varkor commented Mar 22, 2018

Uh oh!

varkor commented Mar 22, 2018

Uh oh!

clarfonthey commented Mar 22, 2018

Uh oh!

varkor commented Mar 22, 2018

Uh oh!

Debug printing of combining characters is wrong #41922

Debug printing of combining characters is wrong #41922

Comments

clarfonthey commented May 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

clarfonthey commented May 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tbu- commented May 11, 2017

Uh oh!

tbu- commented May 11, 2017

Uh oh!

clarfonthey commented May 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clarfonthey commented May 11, 2017

Uh oh!

behnam commented Aug 11, 2017

Uh oh!

varkor commented Mar 22, 2018

Uh oh!

varkor commented Mar 22, 2018

Uh oh!

clarfonthey commented Mar 22, 2018

Uh oh!

varkor commented Mar 22, 2018

Uh oh!

clarfonthey commented May 11, 2017 •

edited

Loading

clarfonthey commented May 11, 2017 •

edited

Loading

clarfonthey commented May 11, 2017 •

edited

Loading