Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Azure storage upload fails on 267 GiB file with Azure ErrorCode: BlockListTooLong #705

Closed
mikael-ngenic opened this issue Jan 16, 2024 · 2 comments · Fixed by #708
Closed
Assignees
Labels
done Issues in the state 'done'

Comments

@mikael-ngenic
Copy link

mikael-ngenic commented Jan 16, 2024

Project board link

stdout:

$ medusa backup --backup-name test5-$(date +%Y%m%d)
[2024-01-15 15:26:45,666] INFO: Resolving ip address
[2024-01-15 15:26:45,670] INFO: ip address to resolve 10.0.0.4
[2024-01-15 15:26:45,673] INFO: Registered backup id test5-20240115
[2024-01-15 15:26:45,673] INFO: Monitoring provider is noop
[2024-01-15 15:26:45,839] WARNING: ssl_storage_port is deprecated as of Apache Cassandra 4.x
[2024-01-15 15:26:46,029] INFO: Starting backup using Stagger: None Mode: differential Name: test5-20240115
[2024-01-15 15:26:46,030] INFO: Updated from existing status: -1 to new status: 0 for backup id: test5-20240115
[2024-01-15 15:26:46,030] INFO: Saving tokenmap and schema
[2024-01-15 15:26:46,319] INFO: Resolving ip address 10.0.0.4
[2024-01-15 15:26:46,319] INFO: ip address to resolve 10.0.0.4
[2024-01-15 15:26:46,320] INFO: Resolving ip address 10.0.0.4
[2024-01-15 15:26:46,320] INFO: ip address to resolve 10.0.0.4
[2024-01-15 15:26:46,359] INFO: Saving server version
[2024-01-15 15:26:46,800] INFO: Node {host].internal.cloudapp.net does not have latest backup
[2024-01-15 15:26:46,800] INFO: Creating snapshot
[2024-01-15 16:44:55,796] ERROR: Error occurred during backup: The block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
ErrorCode:BlockListTooLong
Content: BlockListTooLongThe block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
Traceback (most recent call last):
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/backup_node.py", line 381, in backup_snapshots
manifest_objects += storage.storage_driver.upload_blobs(needs_backup, dst_path)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/storage/abstract_storage.py", line 170, in upload_blobs
manifest_objects = loop.run_until_complete(self._upload_blobs(srcs, dest))
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/storage/abstract_storage.py", line 178, in _upload_blobs
manifest_objects += await asyncio.gather(*chunk)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/storage/azure_storage.py", line 185, in _upload_blob
blob_client = await self.azure_container_client.upload_blob(
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/aio/_container_client_async.py", line 952, in upload_blob
await blob.upload_blob(
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 414, in upload_blob
return await upload_block_blob(**options)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 172, in upload_block_blob
process_storage_error(error)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/_shared/response_handlers.py", line 189, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
File "", line 1, in
azure.core.exceptions.HttpResponseError: The block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
ErrorCode:BlockListTooLong
Content: BlockListTooLongThe block list may not contain more than 50,000 blocks.

/var/log/medusa/medusa.log

[....]
[2024-01-15 15:45:49,266] DEBUG: [Azure Storage] Uploading /data/cassandra/{name}/time_series-a69de050b30d11e6afaa8545e2868465/snapshots/medusa-test5-20240115/nb-11257-big-Data.db (267.339GiB) -> azure://cassandra-backups/{host}.internal.cloudapp.net/data/{name]/time_series-a69de050b30d11e6afaa8545e2868465/nb-11257-big-Data.db
[2024-01-15 16:44:55,796] ERROR: Error occurred during backup: The block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
ErrorCode:BlockListTooLong
Content: BlockListTooLongThe block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
[2024-01-15 16:44:55,844] DEBUG: Cleaning up Cassandra snapshot
[...]

Comments

Azure block limit is 50,000 blocks. [1]
azure-storage-blob library seems to use default block size of 4 MiB. [2][3]
=> max size approximately 195 GiB (4 MiB X 50,000 blocks).
Block size tuning with max_block_size may be required for larger files. [4]
azure-cli seems to solve it with _adjust_block_blob_size. [5]

[1] https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs
[2] max_block_size int The maximum chunk size for uploading a block blob in chunks. Defaults to 4*1024*1024, or 4MB.
https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python
[3] :param int max_block_size: The maximum chunk size for uploading a block blob in chunks. Defaults to 4*1024*1024, or 4MB.
https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/storage/azure-storage-blob/azure/storage/blob/_shared/models.py#L543
[4] https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-tune-upload-download-python#set-transfer-options-for-uploads
[5] https://github.com/Azure/azure-cli/blob/main/src/azure-cli/azure/cli/command_modules/storage/operations/blob.py#L571

Environment

$ /usr/bin/medusa --version
0.17.1
$ cassandra -v
4.0.11
$ python --version
Python 2.7.18
$ python3 --version
Python 3.8.10
$ az version
{
"azure-cli": "2.56.0",
"azure-cli-core": "2.56.0",
"azure-cli-telemetry": "1.1.0",
"extensions": {}
}

@rzvoncek
Copy link
Contributor

Hi @mikael-ngenic ! Thanks for reporting this. I was under the impression that the python SDK we use handles the chunking automatically as well because they do state so in the docs.

I guess "Creates a new blob from a data source with automatic chunking" does not imply automatic chunk sizing after all.

I'll look into a fix for this shortly.

@rzvoncek
Copy link
Contributor

Hello again. I spent some time trying to set bigger chunk size per blob, but that is not straightforward at all in the Azure's SDK. So in #708 we went for setting the bigger chunks globally . With that change, I was able to back up a 266 GB file. It should now manage files up to ~1 TB.

We'll be releasing medusa 0.17.2 where thich fix should be included. Please stay tunned.

@rzvoncek rzvoncek self-assigned this Jan 23, 2024
@rzvoncek rzvoncek moved this to Ready For Review in K8ssandra Jan 23, 2024
@adejanovski adejanovski added the ready-for-review Issues in the state 'ready-for-review' label Jan 23, 2024
@github-project-automation github-project-automation bot moved this from Ready For Review to Done in K8ssandra Jan 23, 2024
@adejanovski adejanovski added done Issues in the state 'done' and removed ready-for-review Issues in the state 'ready-for-review' labels Jan 23, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
done Issues in the state 'done'
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants