Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Detect super-linear worst-case runtime #159

Open
RunDevelopment opened this issue Apr 19, 2021 · 0 comments
Open

Detect super-linear worst-case runtime #159

RunDevelopment opened this issue Apr 19, 2021 · 0 comments
Assignees
Labels
enhancement New feature or request new rule

Comments

@RunDevelopment
Copy link
Collaborator

RunDevelopment commented Apr 19, 2021

I think we should add a rule that reports cases for exponential backtracking and polynomial backtracking.

However, there are two problems standing in our way:

Performance

Detecting exponential backtracking is computationally expensive.

Static analysis methods might only take a few milliseconds per regex but this will quickly add up for a large codebase (e.g. Prism has about 2k regexes). Fuzzers will usually take about one second or more per regex and are completely impractical for our purposes.

Solutions

  • Since the detection itself is deterministic, a simple cache might largely solve the performance problem.

    However, I am not aware of ESLint providing such a feature.

Detection

I am not aware of any existing detector implementation in JavaScript (time of writing: 2021-04-19).

safe-regex isn't practical. It implements a detection method that simply doesn't work. Star height has little to do with exponential runtime. I.e. (a|a)*b has a star height of 1 and the (a|a)* will cause exponential backtracking, and (ab+)*c has a star height of 2 without any form of exponential backtracking.

Solutions

  • Right now, we could use my library scslre to detect at least some polynomial and exponential backtracking quickly. scslre even offers fixes for simple cases.

  • There is also this method I implemented for Prism. This method is currently used in PrismJS and Highlight.JS. The upside of this method is that it is quick. It can analyze the >2.5k regexes of Prism in less than 5 seconds. The downside of this method is that there are false negatives and it cannot offer fixes. However, the positives it does find are mostly true positives.

    I should note that half of this method is already partially implemented by the no-dupe-disjunctions rule. If we find two alternatives that are not disjoint and both are children of a star quantifier, then we can reasonably assume that this will cause exponential backtracking (e.g. (a|a)*). Changing the no-dupe-disjunctions to add this information should be fairly easy.

  • Given that refa recently added ENFAs (not published yet), someone could probably implement a detection method like RXXR fairly easily.

  • I am currently writing my Bachelor thesis on a new static analysis method for exponential and polynomial backtracking. Part of this will be a JS library implementing my new detection method. We could also use this when it's ready (probably in 3-4 months or so).


Related:

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request new rule
Projects
None yet
Development

No branches or pull requests

1 participant