Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

change checksum of gcp download. add retry. #117

Merged
merged 1 commit into from
Feb 1, 2024

Conversation

giovanni-guidini
Copy link
Contributor

context: codecov/engineering-team#1029

As suggested in the ticket we want to explore if switching the checksum
validation to 'crc32c' (from 'md5') reduces the occurance of
DataCorruption errors.

I switched the function to download_as_bytes (see docs) because the existing download_as_string is
deprecated and doesn't provide the cheksum argument.

I also added a retry mechanism to retry the download once in case
of DataCorruption. This is based on vibes and feelings mostly,
but my line of thought is that this might be a temporary issue
around some part of the system downloading a file and another part
updating it kinda concurrently. Also there are legitimate resons
that packets are lost through the network and everything.
If it is a temporary issue a secondary attempt is fairly inexpensive,
and should yield good results.

Plus we can see if this theory works by tracking the number of
issues with the DataCorruption error over time compared to the
number of logs for "we are retrying this".

If the number of logs is low we know the change to 'crc32c' was efficient.
If the number of logs is high but the number of exceptions is low
we know that retrying is effective.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

context: codecov/engineering-team#1029

As suggested in the ticket we want to explore if switching the checksum
validation to 'crc32c' (from 'md5') reduces the occurance of
`DataCorruption` errors.

I switched the function to `download_as_bytes` (see [docs](https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.blob.Blob#google_cloud_storage_blob_Blob_download_as_bytes)) because the existing `download_as_string` is
deprecated and doesn't provide the `cheksum` argument.

I also added a retry mechanism to retry the download once in case
of `DataCorruption`. This is based on vibes and feelings mostly,
but my line of thought is that this _might_ be a temporary issue
around some part of the system downloading a file and another part
updating it kinda concurrently. Also there are legitimate resons
that packets are lost through the network and everything.
If it is a temporary issue a secondary attempt is fairly inexpensive,
and should yield good results.

Plus we can see if this theory works by tracking the number of
issues with the `DataCorruption` error over time compared to the
number of logs for "we are retrying this".

If the number of logs is low we know the change to 'crc32c' was efficient.
If the number of logs is high but the number of exceptions is low
we know that retrying is effective.
Copy link

codecov-public-qa bot commented Jan 31, 2024

Codecov Report

Merging #117 (b7e90f1) into main (ab54697) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #117   +/-   ##
=======================================
  Coverage   89.51%   89.52%           
=======================================
  Files         115      115           
  Lines        9387     9392    +5     
  Branches     1497     1499    +2     
=======================================
+ Hits         8403     8408    +5     
  Misses        781      781           
  Partials      203      203           
Flag Coverage Δ
python3.10 88.71% <100.00%> (+<0.01%) ⬆️
python3.11 88.71% <100.00%> (+<0.01%) ⬆️
python3.12 88.71% <100.00%> (+<0.01%) ⬆️
python3.8 88.80% <100.00%> (+<0.01%) ⬆️
python3.9 89.06% <100.00%> (+<0.01%) ⬆️
rust 90.20% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
shared/storage/gcp.py 97.59% <100.00%> (+0.15%) ⬆️

Impacted file tree graph

@codecov-qa
Copy link

codecov-qa bot commented Jan 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ab54697) 89.51% compared to head (b7e90f1) 89.52%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #117   +/-   ##
=======================================
  Coverage   89.51%   89.52%           
=======================================
  Files         115      115           
  Lines        9387     9392    +5     
  Branches     1497     1499    +2     
=======================================
+ Hits         8403     8408    +5     
  Misses        781      781           
  Partials      203      203           
Flag Coverage Δ
python3.10 88.71% <100.00%> (+<0.01%) ⬆️
python3.11 88.71% <100.00%> (+<0.01%) ⬆️
python3.12 88.71% <100.00%> (+<0.01%) ⬆️
python3.8 88.80% <100.00%> (+<0.01%) ⬆️
python3.9 89.06% <100.00%> (+<0.01%) ⬆️
rust 90.20% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codecov-staging
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

Copy link

codecov bot commented Jan 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ab54697) 91.53% compared to head (b7e90f1) 91.54%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #117   +/-   ##
=======================================
  Coverage   91.53%   91.54%           
=======================================
  Files         122      122           
  Lines        9408     9413    +5     
  Branches     1611     1613    +2     
=======================================
+ Hits         8612     8617    +5     
  Misses        781      781           
  Partials       15       15           
Flag Coverage Δ
python3.10 88.71% <100.00%> (+<0.01%) ⬆️
python3.11 88.71% <100.00%> (+<0.01%) ⬆️
python3.12 88.71% <100.00%> (+<0.01%) ⬆️
python3.8 88.80% <100.00%> (+<0.01%) ⬆️
python3.9 89.06% <100.00%> (+<0.01%) ⬆️
rust 90.20% <ø> (ø)
smart-labels 91.29% <100.00%> (-0.41%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't necessarily have to use the same checksum on upload and download, right? it's fine to just change this and all our existing blobs will still be accessible?

@@ -134,11 +134,16 @@ def read_file(self, bucket_name, path, file_obj=None):
blob.content_encoding = "gzip"
blob.patch()
if file_obj is None:
return blob.download_as_string()
return blob.download_as_bytes(checksum="crc32c")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i checked the docs, apparently both download_as_string() and download_as_bytes() return bytes. i thought download_as_string() might return str which would make this potentially blow up. but it's fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the commit message the download_as_string method is deprecated. It's just an alias for download_as_bytes.

Thanks for doulbe checking it for us :)

@giovanni-guidini
Copy link
Contributor Author

giovanni-guidini commented Jan 31, 2024

re @matt-codecov

we don't necessarily have to use the same checksum on upload and download, right? it's fine to just change this and all our existing blobs will still be accessible?

I hope so. If you look at the docs for the upload function you'll notice we actually don't use any checksum when uploading, so it should be fine. (because the default checksum is None and we don't provide one (see here))

But I'll test in staging before merging juuuust in case :P

@giovanni-guidini giovanni-guidini merged commit 7b98010 into main Feb 1, 2024
26 checks passed
@giovanni-guidini giovanni-guidini deleted the gio/change-checksum-gcp-client branch February 1, 2024 16:05
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants