Skip to content

Near-Future Work #370

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
milseman opened this issue Apr 29, 2022 · 0 comments
Open

Near-Future Work #370

milseman opened this issue Apr 29, 2022 · 0 comments

Comments

@milseman
Copy link
Member

milseman commented Apr 29, 2022

I want to gather up many areas of near-future work that we've been clarifying through the proposal reviews.

Loose categorization:

Language and integration

  • Ability to use a String-backed, CaseIterable enum as a regex component
  • Define errors types for compilation and type mismatches
  • Callouts from literals
  • A Regex-backed enum that will construct a ChoiceOf all cases in order

API

  • Ability to map over a regex, perhaps per-capture, to supply post-processing transforms at regex declaration time
  • A modifier on a regex to convert it to matches-anywhere semantics
    • E.g. regex.matchingAnywhere => Regex { /.*?/ ; regex ; /.*/ }.
    • But we'd preserve the matched range, i.e. reset start/end position
  • Character alignment queries
    • API for whether start/end is Character-aligned for whole match and each capture
  • API to query options (e.g. is this case insensitive?)
  • API for (?n), could be nice to strip out captures you don't care about, especially for type erased regexes.
    • compilation error if there are back-references or it if changes the semantics of the program

Algorithms

  • Add a replace(_:withTemplate:) method that recognizes $1 or \1 placeholders
  • A separator-preserving split variant
  • Suffix / from-the-end operations (trim etc)
  • Customize search

String and Unicode

  • Add unsupported Unicode properties to Unicode.Properties and support in regexes
  • Add Unicode.AllScalars as a public type (semi-tangential)
  • Add var Substring.range: Range<String.Index> to simplify getting the range of a capture group
  • Inits for making a NFC string from UTF-8
  • String.lines() and String.words()
  • Add option for canonical equivalence in scalar-semantic mode

Dynamic Regex API

  • Add a capture-description API to all regexes
    • some RAC of capture, which has a type and optionality
  • Missing match conversions
    • Regex<T>.Match.init?(_:ARO)
    • Regex<T>.Match.init?(_:Regex<ARO>.Match)

Builders

  • A high-level helper for separated/quoted repetitions, e.g Repeat(separator: \.whitespace) { ... }
  • A helper for repeated matching lookahead and negative lookahead, e.g. Repeat(while:) Repeat(whileNot:)
    • Until(negLookaheadCondition) { ... }
  • A func compile() throws to explicitly trigger compilation and get errors, such as quantifying the unquantifiable
    • This is useful when composing regexes together to check the final result instead of trapping at run time.
  • Default Reference capture type to Substring.self

Engine

  • Engine limiters, low-level backtracking control and timeouts
  • Provide a way to access all values of a repeated capture (e.g. subscribe)
  • Conditionals (?(x)...) (requires updated parsing)
  • Quoted string inside custom character classes (e.g. [a-z\q{ch}])

Parser

  • Support for duplicate group names through (?J) (requires figuring out typed captures)
  • Support for branch reset alternations (?|) (parsing is implemented, but requires figuring out typed captures)
  • Parsing of conditionals (?(x)...) in accordance to what is in the syntax proposal (we currently parse the condition differently)
    • Including interpolation conditions (?(?{...}))
    • Conditional conditions don't capture on their own, only for child nodes e.g (?((x))x). .NET also forbids named capture conditions, we should ban that.
    • Stop parsing named reference conditions for (?(x)...)
    • Don't allow (?(DEFINE)) to have a false branch
  • Support for regex property values \p{key=/regex/}
  • Support for transform matching e.g \p{toNFKC_Casefold=@toNFKC@}
  • Support for alternative character property separators?
    • UTS#18 suggests key≠value, key!=value
    • Perl allows key:value
  • Support a** syntax as explicitly eager quantification
    • I.e. it's not affected by API to change default quantification kind, (probably) not affected by (?U)
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant