Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Work around CLR unicode bug with classification of U+2028 #9519

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions src/Compilers/CSharp/Portable/SymbolDisplay/ObjectDisplay.cs
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,7 @@ private static bool TryReplaceQuote(char c, char quote, out string replaceWith)

/// <summary>
/// Returns true if the character should be replaced and sets
/// <paramref name="replaceWith"/> to the replacement text if the
/// character is replaced with text other than the Unicode escape sequence.
/// <paramref name="replaceWith"/> to the replacement text.
/// </summary>
private static bool TryReplaceChar(char c, out string replaceWith)
{
Expand Down Expand Up @@ -184,6 +183,12 @@ private static bool TryReplaceChar(char c, out string replaceWith)
case '\v':
replaceWith = "\\v";
break;
case '\u2028':
// U+2028 "LINE SEPARATOR" should be classified by CharUnicodeInfo.GetUnicodeCategory as
// UnicodeCategory.LineSeparator but due to a bug is incorrectly categorized as
// UnicodeCategory.Format. See https://github.com/dotnet/coreclr/issues/3542
replaceWith = "\\u2028";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of calls to CharUnicodeInfo.GetUnicodeCategory in our codebase and some of those sites can still suffer from the bug in this API. For example (in src\Compilers\Core\Portable\UnicodeCharacterUtilities.cs):

            UnicodeCategory cat = CharUnicodeInfo.GetUnicodeCategory(ch);
            return IsLetterChar(cat)
                || IsDecimalDigitChar(cat)
                || IsConnectingChar(cat)
                || IsCombiningChar(cat)
                || IsFormattingChar(cat);

Instead of patching this particular call site in this manner, we should consider creating a wrapper for CharUnicodeInfo.GetUnicodeCategory, which would make necessary adjustments to the value it returns and simply change all call sites to use that wrapper.

break;
}

if (replaceWith != null)
Expand Down
8 changes: 8 additions & 0 deletions src/Compilers/CSharp/Test/Syntax/Syntax/SyntaxFactoryTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,14 @@ public void TestLiteralToStringDifferentCulture()
CultureInfo.CurrentCulture = culture;
}

[WorkItem(9484, "https://github.com/dotnet/roslyn/issues/9484")]
[Fact]
public void TestEscapeLineSeparator()
{
var literal = SyntaxFactory.Literal("\u2028");
Assert.Equal("\"\\u2028\"", literal.Text);
}

private static void CheckLiteralToString(dynamic value, string expected)
{
var literal = SyntaxFactory.Literal(value);
Expand Down