Skip to content

urllib3 2.0: Double reading causes IncompleteRead error #295

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
frostming opened this issue May 7, 2023 · 0 comments · Fixed by #296
Closed

urllib3 2.0: Double reading causes IncompleteRead error #295

frostming opened this issue May 7, 2023 · 0 comments · Fixed by #296
Assignees

Comments

@frostming
Copy link
Contributor

frostming commented May 7, 2023

In the Serializer.dumps() method, the underlying fp is replaced with a new, never read stream, which causes the stream to be double read: https://github.com/ionrock/cachecontrol/blob/c05ef9eff1c9ac176481fb99e7a7188aa5b4e17b/cachecontrol/serialize.py#L35-L36

This will lead to an IncompleteRead error on urllib3 2.0 since it starts to strictly check the HTTPResponse.length_remaining attribute which will be a negative value(content_length - 2*content_length = -content_length).

I have no idea why this replacement is necessary.

This can be reproduced with the following minimal reproducible example:

from cachecontrol import CacheControl
from cachecontrol.serialize import Serializer as LegacySerializer
from requests import Session


class Serializer(LegacySerializer):
    def dumps(self, request, response, body=None):
        if not hasattr(response, "strict"):
            # XXX: urllib3 2.0 removes this attribute
            response.strict = False
        return super().dumps(request, response, body)

    def prepare_response(self, request, cached, body_file=None):
        # We don't need to pass strict to HTTPResponse
        cached["response"].pop("strict", None)
        return super().prepare_response(request, cached, body_file)


s = CacheControl(Session(), serializer=Serializer())
url = "https://pypi.org/simple/pdm-backend/"
headers = {"Cache-Control": "max-age=0"}
s.get(url, headers=headers).content
resp = s.get(url, headers=headers)
assert resp.from_cache
resp.content

The error:

Traceback (most recent call last):
  File "/Users/fming/wkspace/github/pdm/venv/lib/python3.10/site-packages/urllib3/response.py", line 705, in _error_catcher
    yield
  File "/Users/fming/wkspace/github/pdm/venv/lib/python3.10/site-packages/urllib3/response.py", line 830, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(4658 bytes read, -2329 more expected)

...(more tracebacks omitted)

Note that the read size 4658 is twice the content length 2329

Related to #264

frostming added a commit to frostming/cacheyou that referenced this issue May 8, 2023
Fixes psf#295

Signed-off-by: Frost Ming <me@frostming.com>
frostming added a commit to frostming/cacheyou that referenced this issue May 8, 2023
Fixes psf#295

Signed-off-by: Frost Ming <me@frostming.com>
frostming added a commit to frostming/cacheyou that referenced this issue May 18, 2023
Fixes psf#295

Signed-off-by: Frost Ming <me@frostming.com>
frostming added a commit to frostming/cacheyou that referenced this issue May 18, 2023
Fixes psf#295

Signed-off-by: Frost Ming <me@frostming.com>
woodruffw pushed a commit to woodruffw-forks/cachecontrol that referenced this issue May 22, 2023
Fixes psf#295

Signed-off-by: Frost Ming <me@frostming.com>
@frostming frostming self-assigned this May 25, 2023
frostming added a commit to frostming/cacheyou that referenced this issue May 25, 2023
Fixes psf#295

Signed-off-by: Frost Ming <me@frostming.com>
frostming added a commit that referenced this issue May 29, 2023
Fixes #295

Signed-off-by: Frost Ming <me@frostming.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
1 participant