Skip to content
This repository was archived by the owner on Feb 16, 2024. It is now read-only.
This repository was archived by the owner on Feb 16, 2024. It is now read-only.

Consideration for Perl-like (?[]) extended character classes instead of a flag #39

Closed
@rbuckton

Description

@rbuckton

I've been researching regular expression syntax in various languages and engines to inform possible future proposals to expand the ECMAScript regular expression syntax. One of the features I've been reviewing is Perl's Extended Bracketed Character Classes, which support operations such as:

  • Intersection (&)
  • Union (+ or |)
  • Subtraction (-)
  • Symmetric Difference (^)
  • Complement (!)
  • Grouping ((, ))

In this case, such a character class uses the tokens (?[ and ]). The contents of the expression can contain the above tokens, whitespace (which is ignored), character classes, metacharacters (such as \p{..}, \s, etc.), and certain escape sequences (such as \x0a, etc.). This allows you to write complex character classes like the following (based on the examples in the explainer):

# non-ASCI digits
(?[ \p{Decimal_Number} - [0-9] ])

# spans of word/identifier letters of specific scripts
(?[ \p{Script=Khmer} & [\p{Letter}\p{Mark}\p{Number}] ])

# breaking spaces
(?[ \p{White_Space} - \p{Line_Break=Glue} ])

# non-ASCII emoji
(?[ \p{Emoji} - \p{ASCII} ])

As well as classes like the following (from the perlre documentation):

# Matches digits in the Thai or Laotian scripts
(?[ ( \p{Thai} + \p{Lao} ) & \p{Digit} ])

Currently, (?[ is not valid RegExp syntax (with or without the u flag), so it provides an opportunity to add syntax to cover set notation functionality without needing to introduce a new flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions