Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

MediaIoBaseDownload next_chunk() Range header off-by-one #1593

Closed
jvtm opened this issue Nov 7, 2021 · 2 comments · Fixed by #1595
Closed

MediaIoBaseDownload next_chunk() Range header off-by-one #1593

jvtm opened this issue Nov 7, 2021 · 2 comments · Fixed by #1595
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@jvtm
Copy link
Contributor

jvtm commented Nov 7, 2021

Environment details

  • OS type and version:
Ubuntu 20.04.3 LTS
  • Python version: python --version
Python 3.9.7
  • pip version: pip --version
pip 21.3.1
  • google-api-python-client version: pip show google-api-python-client
Name: google-api-python-client
Version: 2.29.0
Summary: Google API Client Library for Python
Home-page: https://github.com/googleapis/google-api-python-client/
Author: Google LLC
Author-email: googleapis-packages@google.com
License: Apache 2.0
Requires: google-api-core, google-auth, google-auth-httplib2, httplib2, uritemplate

Steps to reproduce

  1. Use MediaIoBaseDownload(..., chunksize=1024) for downloading a file from Google Drive (see https://developers.google.com/drive/api/v3/manage-downloads#python)
  2. Received chunk is 1025 bytes
  3. See examples / behavior defined in https://httpwg.org/specs/rfc7233.html#rule.ranges-specifier

Code example

import logging
from io import BytesIO

logging.basicConfig(level="DEBUG")

# example / skipping drive, auth setup for obvious reasons

file_id = "foobar123456789"
chunk_size = 1024
file_obj = BytesIO()
request = drive.files().get_media(fileId=file_id)
downloader =  MediaIoBaseDownload(file_obj, request, chunksize=chunk_size)
done = False
while not done:
    # debug log the lower level HTTP request headers?
    status, done = downloader.next_chunk()
    logging.debug(
        "Download status %r %s/%s bytes (%.1f%%)",
        file_id,
        status.resumable_progress,
        status.total_size,
        status.progress() * 100,
    )
    if not done:
        assert file_obj.tell() == status.resumable_progress
        assert file_obj.tell() % chunk_size == 0
    else:
        assert file_obj.tell() == status.total_size

In real life chunk size should be few megabytes at least. Default seems to be 100MiB:

googleapiclient/http.py:DEFAULT_CHUNK_SIZE = 100 * 1024 * 1024

Stack trace

.

Patch

Is it worth making a PR, signing CLA? :D

diff --git a/googleapiclient/http.py b/googleapiclient/http.py
index 1b661e1b..927464e9 100644
--- a/googleapiclient/http.py
+++ b/googleapiclient/http.py
@@ -733,7 +733,7 @@ class MediaIoBaseDownload(object):
         headers = self._headers.copy()
         headers["range"] = "bytes=%d-%d" % (
             self._progress,
-            self._progress + self._chunksize,
+            self._progress + self._chunksize - 1,
         )
         http = self._request.http
 
@parthea parthea added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Nov 8, 2021
@parthea parthea self-assigned this Nov 8, 2021
@parthea parthea added the status: investigating The issue is under investigation, which is determined to be non-trivial. label Nov 8, 2021
@parthea
Copy link
Contributor

parthea commented Nov 8, 2021

Hi @jvtm , I'm looking into it. In the meantime, please feel free to open a PR.

jvtm added a commit to jvtm/google-api-python-client that referenced this issue Nov 8, 2021
Issue: It looks like Range header end value was constructed by adding chunk size to
current position. This leads into the request being one byte longer, and also
the received response body is one byte longer than anticipated.

Fix: adjust the end to be one byte earlier.

For example with `chunksize=1024` and current position being 0, the range header
should be set to `bytes=0-1023`.

See: https://httpwg.org/specs/rfc7233.html#rule.ranges-specifier

Resolves: googleapis#1593
jvtm added a commit to jvtm/google-api-python-client that referenced this issue Nov 8, 2021
Issue: It looks like Range header end value was constructed by adding chunk size to
current position. This leads into the request being one byte longer, and also
the received response body is one byte longer than anticipated.

Fix: adjust the end to be one byte earlier.

For example with `chunksize=1024` and current position being 0, the range header
should be set to `bytes=0-1023`.

See: https://httpwg.org/specs/rfc7233.html#rule.ranges-specifier

Resolves: googleapis#1593
@jvtm
Copy link
Contributor Author

jvtm commented Nov 9, 2021

PR opened, CLA signed

jvtm added a commit to jvtm/google-api-python-client that referenced this issue Nov 9, 2021
Issue: It looks like Range header end value was constructed by adding chunk size to
current position. This leads into the request being one byte longer, and also
the received response body is one byte longer than anticipated.

Fix: adjust the end to be one byte earlier.

For example with `chunksize=1024` and current position being 0, the range header
should be set to `bytes=0-1023`.

See: https://httpwg.org/specs/rfc7233.html#rule.ranges-specifier

Resolves: googleapis#1593
gcf-merge-on-green bot pushed a commit that referenced this issue Nov 11, 2021
Issue: It looks like Range header end value was constructed by adding chunk size to
current position. This leads into the request being one byte longer, and also
the received response body is one byte longer than anticipated.

Fix: adjust the end to be one byte earlier.

For example with `chunksize=1024` and current position being 0, the range header
should be set to `bytes=0-1023`.

See: https://httpwg.org/specs/rfc7233.html#rule.ranges-specifier

Fixes #1593

----

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [x] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/google-api-python-client/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes #1593 🦕
@parthea parthea removed the status: investigating The issue is under investigation, which is determined to be non-trivial. label Nov 11, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants