Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Work around CLR unicode bug with classification of U+2028 #9519

Closed

Conversation

gafter
Copy link
Member

@gafter gafter commented Mar 7, 2016

Fixes #9484

@dotnet/roslyn-compiler Please review

@gafter gafter added Bug Concept-API This issue involves adding, removing, clarification, or modification of an API. Area-Compilers 4 - In Review A fix for the issue is submitted for review. labels Mar 7, 2016
@gafter gafter self-assigned this Mar 7, 2016
@gafter gafter added this to the 2.0 (Preview) milestone Mar 7, 2016
@cston
Copy link
Member

cston commented Mar 7, 2016

LGTM

@jaredpar
Copy link
Member

jaredpar commented Mar 7, 2016

CC @ellismg

@jaredpar
Copy link
Member

jaredpar commented Mar 7, 2016

@gafter future-stabilization is basically an ask mode branch. This doesn't intuitively seem like an ask mode issue. Am I missing a case?

@gafter
Copy link
Member Author

gafter commented Mar 7, 2016

Why did you assign the issue to 2.0 (Preview)? Isn't that the milestone for this preview?

@jaredpar
Copy link
Member

jaredpar commented Mar 7, 2016

@gafter ah okay. That's my mistake.

// U+2028 "LINE SEPARATOR" should be classified by CharUnicodeInfo.GetUnicodeCategory as
// UnicodeCategory.LineSeparator but due to a bug is incorrectly categorized as
// UnicodeCategory.Format. See https://github.com/dotnet/coreclr/issues/3542
replaceWith = "\\u2028";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of calls to CharUnicodeInfo.GetUnicodeCategory in our codebase and some of those sites can still suffer from the bug in this API. For example (in src\Compilers\Core\Portable\UnicodeCharacterUtilities.cs):

            UnicodeCategory cat = CharUnicodeInfo.GetUnicodeCategory(ch);
            return IsLetterChar(cat)
                || IsDecimalDigitChar(cat)
                || IsConnectingChar(cat)
                || IsCombiningChar(cat)
                || IsFormattingChar(cat);

Instead of patching this particular call site in this manner, we should consider creating a wrapper for CharUnicodeInfo.GetUnicodeCategory, which would make necessary adjustments to the value it returns and simply change all call sites to use that wrapper.

@gafter
Copy link
Member Author

gafter commented Mar 7, 2016

Moving to the future branch.

@gafter gafter closed this Mar 7, 2016
@gafter gafter removed the 4 - In Review A fix for the issue is submitted for review. label Mar 7, 2016
@gafter gafter deleted the future-stabilization-9484 branch May 24, 2018 19:05
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Area-Compilers Bug cla-already-signed Concept-API This issue involves adding, removing, clarification, or modification of an API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants