Skip to content
This repository has been archived by the owner on Feb 18, 2025. It is now read-only.

EncodeForRegExpEscape should not return results that require particular flags #69

Closed
gibson042 opened this issue Mar 27, 2024 · 3 comments · Fixed by #71
Closed

EncodeForRegExpEscape should not return results that require particular flags #69

gibson042 opened this issue Mar 27, 2024 · 3 comments · Fixed by #71

Comments

@gibson042
Copy link
Contributor

EncodeForRegExpEscape step 4.e (which would be reached if input c were a Space_Separator supplementary code point in [U+10000, U+10FFFF]) results in a return value like \u{…}. The interpretation of such pattern text is dependent upon regular expression flags—specifically, it is interpreted as a |RegExpUnicodeEscapeSequence| that will match a code point with the contained hexadecimal value in the presence of a "u" or "v" flag, but otherwise is interpreted as either a syntax error or (only in a host supporting Annex B and only when the hexadecimal representation of the code point consists only of decimal digits) as a quantified |ExtendedAtom| "u" with the specified decimal count of repetitions (e.g., /^\u{10000}$/.test("u".repeat(10000)) is true).

Rather than returning results subject to conditional interpretation, EncodeForRegExpEscape should return a \u…\u… surrogate pair |RegExpUnicodeEscapeSequence| for such inputs (which work in both Unicode and non-Unicode regular expressions, e.g. /^\uD834\uDF06$/u.test("𝌆") and /^\uD834\uDF06$/v.test("𝌆") and /^\uD834\uDF06$/.test("𝌆") are all true).

Or alternatively (and preferably IMO), EncodeForRegExpEscape should not escape all white space. I'm not certain why it does so right now, but looking back I suspect it is due to a misinterpretation of #30 (which requests escaping of control characters, and even more specifically line terminators—and even that isn't necessary).

@gibson042 gibson042 mentioned this issue Mar 27, 2024
33 tasks
@bakkot
Copy link
Collaborator

bakkot commented Mar 27, 2024

Whitespace is escaped to leave room for /x mode regexps in the future.

@ljharb
Copy link
Member

ljharb commented Mar 27, 2024

So to make sure I understand the issue properly, this would be solved if done by code units, and not code points?

@jridgewell
Copy link
Member

Yes, but I think there is a possibility that a Space_Separator is added in the future that exists in the higher U+100000-10FFFF range. So we would be adding this same support in the future.

ljharb added a commit that referenced this issue Mar 27, 2024
ljharb added a commit that referenced this issue Mar 27, 2024
@ljharb ljharb closed this as completed in 21cdd91 Mar 28, 2024
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants