EncodeForRegExpEscape should not return results that require particular flags #69

gibson042 · 2024-03-27T18:04:37Z

EncodeForRegExpEscape step 4.e (which would be reached if input c were a Space_Separator supplementary code point in [U+10000, U+10FFFF]) results in a return value like \u{…}. The interpretation of such pattern text is dependent upon regular expression flags—specifically, it is interpreted as a |RegExpUnicodeEscapeSequence| that will match a code point with the contained hexadecimal value in the presence of a "u" or "v" flag, but otherwise is interpreted as either a syntax error or (only in a host supporting Annex B and only when the hexadecimal representation of the code point consists only of decimal digits) as a quantified |ExtendedAtom| "u" with the specified decimal count of repetitions (e.g., /^\u{10000}$/.test("u".repeat(10000)) is true).

Rather than returning results subject to conditional interpretation, EncodeForRegExpEscape should return a \u…\u… surrogate pair |RegExpUnicodeEscapeSequence| for such inputs (which work in both Unicode and non-Unicode regular expressions, e.g. /^\uD834\uDF06$/u.test("𝌆") and /^\uD834\uDF06$/v.test("𝌆") and /^\uD834\uDF06$/.test("𝌆") are all true).

Or alternatively (and preferably IMO), EncodeForRegExpEscape should not escape all white space. I'm not certain why it does so right now, but looking back I suspect it is due to a misinterpretation of #30 (which requests escaping of control characters, and even more specifically line terminators—and even that isn't necessary).

The text was updated successfully, but these errors were encountered:

bakkot · 2024-03-27T18:09:36Z

Whitespace is escaped to leave room for /x mode regexps in the future.

ljharb · 2024-03-27T20:52:05Z

So to make sure I understand the issue properly, this would be solved if done by code units, and not code points?

jridgewell · 2024-03-27T23:24:16Z

Yes, but I think there is a possibility that a Space_Separator is added in the future that exists in the higher U+100000-10FFFF range. So we would be adding this same support in the future.

Fixes #69

gibson042 mentioned this issue Mar 27, 2024

Path to Stage 4! #58

Closed

33 tasks

ljharb added a commit that referenced this issue Mar 27, 2024

[spec] handle surrogate pairs

fed03d7

Fixes #69

ljharb mentioned this issue Mar 27, 2024

[spec] handle surrogate pairs #71

Merged

ljharb added a commit that referenced this issue Mar 27, 2024

[spec] handle surrogate pairs

f527b0d

Fixes #69

ljharb closed this as completed in 21cdd91 Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EncodeForRegExpEscape should not return results that require particular flags #69

EncodeForRegExpEscape should not return results that require particular flags #69

gibson042 commented Mar 27, 2024

bakkot commented Mar 27, 2024

ljharb commented Mar 27, 2024

jridgewell commented Mar 27, 2024

EncodeForRegExpEscape should not return results that require particular flags #69

EncodeForRegExpEscape should not return results that require particular flags #69

Comments

gibson042 commented Mar 27, 2024

bakkot commented Mar 27, 2024

ljharb commented Mar 27, 2024

jridgewell commented Mar 27, 2024