Skip to content

Fix cubic ReDoS in fenced code and references #1130

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
May 7, 2021
Merged

Fix cubic ReDoS in fenced code and references #1130

merged 1 commit into from
May 7, 2021

Conversation

b-c-ds
Copy link
Contributor

@b-c-ds b-c-ds commented May 7, 2021

Two regular expressions were vulnerable to Regular Expression Denial of Service (ReDoS).

Crafted strings containing a long sequence of spaces could cause Denial of Service by making markdown take a long time to process.

This represents a vulnerability when untrusted user input is processed with the markdown package.

ReferencesProcessor:

class ReferenceProcessor(BlockProcessor):
""" Process link references. """
RE = re.compile(
r'^[ ]{0,3}\[([^\]]*)\]:[ ]*\n?[ ]*([^\s]+)[ ]*\n?[ ]*((["\'])(.*)\4|\((.*)\))?[ ]*$', re.MULTILINE
)

e.g.:

import markdown
markdown.markdown('[]:0' + ' ' * 4321 + '0')

FencedBlockPreprocessor (requires fenced_code extension):

FENCED_BLOCK_RE = re.compile(
dedent(r'''
(?P<fence>^(?:~{3,}|`{3,}))[ ]* # opening fence
((\{(?P<attrs>[^\}\n]*)\})?| # (optional {attrs} or
(\.?(?P<lang>[\w#.+-]*))?[ ]* # optional (.)lang
(hl_lines=(?P<quot>"|')(?P<hl_lines>.*?)(?P=quot))?) # optional hl_lines)
[ ]*\n # newline (end of opening fence)
(?P<code>.*?)(?<=\n) # the code block
(?P=fence)[ ]*$ # closing fence
'''),
re.MULTILINE | re.DOTALL | re.VERBOSE
)

e.g.:

import markdown
markdown.markdown('```' + ' ' * 4321, extensions=['fenced_code'])

Both regular expressions had cubic worst-case complexity, so doubling the number of spaces made processing take 8 times as long.
The cubic behaviour can be seen as follows:

$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 1000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 1000 + '0')"  1.25s user 0.02s system 99% cpu 1.271 total
$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 2000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 2000 + '0')"  9.01s user 0.02s system 99% cpu 9.040 total
$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 4000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 4000 + '0')"  74.86s user 0.27s system 99% cpu 1:15.38 total

Both regexes had three [ ]* groups separated by optional groups, in effect making the regex [ ]*[ ]*[ ]*.

Discovered using doyensec/regexploit.

Two regular expressions were vulerable to Regular Expression Denial of
Service (ReDoS).

Crafted strings containing a long sequence of spaces could cause Denial
of Service by making markdown take a long time to process.
This represents a vulnerability when untrusted user input is processed
with the markdown package.

ReferencesProcessor:

https://github.com/Python-Markdown/markdown/blob/4acb949256adc535d6e6cd8/markdown/blockprocessors.py#L559-L563

e.g.:

```python
import markdown
markdown.markdown('[]:0' + ' ' * 4321 + '0')
```

FencedBlockPreprocessor (requires fenced_code extension):

https://github.com/Python-Markdown/markdown/blob/a11431539d08e14b0bd821c/markdown/extensions/fenced_code.py#L43-L54

e.g.:

```python
import markdown
markdown.markdown('```' + ' ' * 4321, extensions=['fenced_code'])
```

Both regular expressions had cubic worst-case complexity, so doubling
the number of spaces made processing take 8 times as long.
The cubic behaviour can be seen as follows:

```
$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 1000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 1000 + '0')"  1.25s user 0.02s system 99% cpu 1.271 total
$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 2000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 2000 + '0')"  9.01s user 0.02s system 99% cpu 9.040 total
$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 4000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 4000 + '0')"  74.86s user 0.27s system 99% cpu 1:15.38 total
```

Both regexes had three `[ ]*` groups separated by optional groups, in
effect making the regex `[ ]*[ ]*[ ]*`.

Discovered using [regexploit](https://github.com/doyensec/regexploit).
@waylan waylan merged commit eacff47 into Python-Markdown:master May 7, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants