Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
change checksum of gcp download. add retry.
context: codecov/engineering-team#1029 As suggested in the ticket we want to explore if switching the checksum validation to 'crc32c' (from 'md5') reduces the occurance of `DataCorruption` errors. I switched the function to `download_as_bytes` (see [docs](https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.blob.Blob#google_cloud_storage_blob_Blob_download_as_bytes)) because the existing `download_as_string` is deprecated and doesn't provide the `cheksum` argument. I also added a retry mechanism to retry the download once in case of `DataCorruption`. This is based on vibes and feelings mostly, but my line of thought is that this _might_ be a temporary issue around some part of the system downloading a file and another part updating it kinda concurrently. Also there are legitimate resons that packets are lost through the network and everything. If it is a temporary issue a secondary attempt is fairly inexpensive, and should yield good results. Plus we can see if this theory works by tracking the number of issues with the `DataCorruption` error over time compared to the number of logs for "we are retrying this". If the number of logs is low we know the change to 'crc32c' was efficient. If the number of logs is high but the number of exceptions is low we know that retrying is effective.
- Loading branch information