You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$ medusa backup --backup-name test5-$(date +%Y%m%d)
[2024-01-15 15:26:45,666] INFO: Resolving ip address
[2024-01-15 15:26:45,670] INFO: ip address to resolve 10.0.0.4
[2024-01-15 15:26:45,673] INFO: Registered backup id test5-20240115
[2024-01-15 15:26:45,673] INFO: Monitoring provider is noop
[2024-01-15 15:26:45,839] WARNING: ssl_storage_port is deprecated as of Apache Cassandra 4.x
[2024-01-15 15:26:46,029] INFO: Starting backup using Stagger: None Mode: differential Name: test5-20240115
[2024-01-15 15:26:46,030] INFO: Updated from existing status: -1 to new status: 0 for backup id: test5-20240115
[2024-01-15 15:26:46,030] INFO: Saving tokenmap and schema
[2024-01-15 15:26:46,319] INFO: Resolving ip address 10.0.0.4
[2024-01-15 15:26:46,319] INFO: ip address to resolve 10.0.0.4
[2024-01-15 15:26:46,320] INFO: Resolving ip address 10.0.0.4
[2024-01-15 15:26:46,320] INFO: ip address to resolve 10.0.0.4
[2024-01-15 15:26:46,359] INFO: Saving server version
[2024-01-15 15:26:46,800] INFO: Node {host].internal.cloudapp.net does not have latest backup
[2024-01-15 15:26:46,800] INFO: Creating snapshot
[2024-01-15 16:44:55,796] ERROR: Error occurred during backup: The block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
ErrorCode:BlockListTooLong
Content: BlockListTooLongThe block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
Traceback (most recent call last):
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/backup_node.py", line 381, in backup_snapshots
manifest_objects += storage.storage_driver.upload_blobs(needs_backup, dst_path)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/storage/abstract_storage.py", line 170, in upload_blobs
manifest_objects = loop.run_until_complete(self._upload_blobs(srcs, dest))
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/storage/abstract_storage.py", line 178, in _upload_blobs
manifest_objects += await asyncio.gather(*chunk)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/storage/azure_storage.py", line 185, in _upload_blob
blob_client = await self.azure_container_client.upload_blob(
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/aio/_container_client_async.py", line 952, in upload_blob
await blob.upload_blob(
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 414, in upload_blob
return await upload_block_blob(**options)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 172, in upload_block_blob
process_storage_error(error)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/_shared/response_handlers.py", line 189, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
File "", line 1, in
azure.core.exceptions.HttpResponseError: The block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
ErrorCode:BlockListTooLong
Content: BlockListTooLongThe block list may not contain more than 50,000 blocks.
/var/log/medusa/medusa.log
[....]
[2024-01-15 15:45:49,266] DEBUG: [Azure Storage] Uploading /data/cassandra/{name}/time_series-a69de050b30d11e6afaa8545e2868465/snapshots/medusa-test5-20240115/nb-11257-big-Data.db (267.339GiB) -> azure://cassandra-backups/{host}.internal.cloudapp.net/data/{name]/time_series-a69de050b30d11e6afaa8545e2868465/nb-11257-big-Data.db
[2024-01-15 16:44:55,796] ERROR: Error occurred during backup: The block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
ErrorCode:BlockListTooLong
Content: BlockListTooLongThe block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
[2024-01-15 16:44:55,844] DEBUG: Cleaning up Cassandra snapshot
[...]
Comments
Azure block limit is 50,000 blocks. [1]
azure-storage-blob library seems to use default block size of 4 MiB. [2][3]
=> max size approximately 195 GiB (4 MiB X 50,000 blocks).
Block size tuning with max_block_size may be required for larger files. [4]
azure-cli seems to solve it with _adjust_block_blob_size. [5]
Hello again. I spent some time trying to set bigger chunk size per blob, but that is not straightforward at all in the Azure's SDK. So in #708 we went for setting the bigger chunks globally . With that change, I was able to back up a 266 GB file. It should now manage files up to ~1 TB.
We'll be releasing medusa 0.17.2 where thich fix should be included. Please stay tunned.
Project board link
stdout:
$ medusa backup --backup-name test5-$(date +%Y%m%d)
[2024-01-15 15:26:45,666] INFO: Resolving ip address
[2024-01-15 15:26:45,670] INFO: ip address to resolve 10.0.0.4
[2024-01-15 15:26:45,673] INFO: Registered backup id test5-20240115
[2024-01-15 15:26:45,673] INFO: Monitoring provider is noop
[2024-01-15 15:26:45,839] WARNING: ssl_storage_port is deprecated as of Apache Cassandra 4.x
[2024-01-15 15:26:46,029] INFO: Starting backup using Stagger: None Mode: differential Name: test5-20240115
[2024-01-15 15:26:46,030] INFO: Updated from existing status: -1 to new status: 0 for backup id: test5-20240115
[2024-01-15 15:26:46,030] INFO: Saving tokenmap and schema
[2024-01-15 15:26:46,319] INFO: Resolving ip address 10.0.0.4
[2024-01-15 15:26:46,319] INFO: ip address to resolve 10.0.0.4
[2024-01-15 15:26:46,320] INFO: Resolving ip address 10.0.0.4
[2024-01-15 15:26:46,320] INFO: ip address to resolve 10.0.0.4
[2024-01-15 15:26:46,359] INFO: Saving server version
[2024-01-15 15:26:46,800] INFO: Node {host].internal.cloudapp.net does not have latest backup
[2024-01-15 15:26:46,800] INFO: Creating snapshot
[2024-01-15 16:44:55,796] ERROR: Error occurred during backup: The block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
ErrorCode:BlockListTooLong
Content:
BlockListTooLong
The block list may not contain more than 50,000 blocks.RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
Traceback (most recent call last):
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/backup_node.py", line 381, in backup_snapshots
manifest_objects += storage.storage_driver.upload_blobs(needs_backup, dst_path)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/storage/abstract_storage.py", line 170, in upload_blobs
manifest_objects = loop.run_until_complete(self._upload_blobs(srcs, dest))
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/storage/abstract_storage.py", line 178, in _upload_blobs
manifest_objects += await asyncio.gather(*chunk)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/medusa/storage/azure_storage.py", line 185, in _upload_blob
blob_client = await self.azure_container_client.upload_blob(
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/aio/_container_client_async.py", line 952, in upload_blob
await blob.upload_blob(
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
return await func(*args, **kwargs)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 414, in upload_blob
return await upload_block_blob(**options)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/aio/_upload_helpers.py", line 172, in upload_block_blob
process_storage_error(error)
File "/usr/share/cassandra-medusa/lib/python3.8/site-packages/azure/storage/blob/_shared/response_handlers.py", line 189, in process_storage_error
exec("raise error from None") # pylint: disable=exec-used # nosec
File "", line 1, in
azure.core.exceptions.HttpResponseError: The block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
ErrorCode:BlockListTooLong
Content:
BlockListTooLong
The block list may not contain more than 50,000 blocks./var/log/medusa/medusa.log
[....]
[2024-01-15 15:45:49,266] DEBUG: [Azure Storage] Uploading /data/cassandra/{name}/time_series-a69de050b30d11e6afaa8545e2868465/snapshots/medusa-test5-20240115/nb-11257-big-Data.db (267.339GiB) -> azure://cassandra-backups/{host}.internal.cloudapp.net/data/{name]/time_series-a69de050b30d11e6afaa8545e2868465/nb-11257-big-Data.db
[2024-01-15 16:44:55,796] ERROR: Error occurred during backup: The block list may not contain more than 50,000 blocks.
RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
ErrorCode:BlockListTooLong
Content:
BlockListTooLong
The block list may not contain more than 50,000 blocks.RequestId:b1ea17a9-801e-0068-68d2-470b8a000000
Time:2024-01-15T16:44:55.7830927Z
[2024-01-15 16:44:55,844] DEBUG: Cleaning up Cassandra snapshot
[...]
Comments
Azure block limit is 50,000 blocks. [1]
azure-storage-blob library seems to use default block size of 4 MiB. [2][3]
=> max size approximately 195 GiB (4 MiB X 50,000 blocks).
Block size tuning with max_block_size may be required for larger files. [4]
azure-cli seems to solve it with _adjust_block_blob_size. [5]
[1] https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs
[2] max_block_size int The maximum chunk size for uploading a block blob in chunks. Defaults to 4*1024*1024, or 4MB.
https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python
[3] :param int max_block_size: The maximum chunk size for uploading a block blob in chunks. Defaults to 4*1024*1024, or 4MB.
https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/storage/azure-storage-blob/azure/storage/blob/_shared/models.py#L543
[4] https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-tune-upload-download-python#set-transfer-options-for-uploads
[5] https://github.com/Azure/azure-cli/blob/main/src/azure-cli/azure/cli/command_modules/storage/operations/blob.py#L571
Environment
$ /usr/bin/medusa --version
0.17.1
$ cassandra -v
4.0.11
$ python --version
Python 2.7.18
$ python3 --version
Python 3.8.10
$ az version
{
"azure-cli": "2.56.0",
"azure-cli-core": "2.56.0",
"azure-cli-telemetry": "1.1.0",
"extensions": {}
}
The text was updated successfully, but these errors were encountered: