-
-
Notifications
You must be signed in to change notification settings - Fork 328
zarr-python cannot read arrays saved by tensorstore using the zstd compressor #2056
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
I previously discussed the root cause of this here: |
Here's a more compact reproducer. Error exists with zarr-python version 3.0.2. Reproducerimport zarr
import tensorstore as ts
zarr_path = "reproduce_zarr-python_issue_2056.zarr"
arr = ts.open({
"driver": "zarr",
"kvstore": {
"driver": "file",
"path": zarr_path
},
"key_encoding": "/",
"metadata": {
"shape": [1024, 1024],
"chunks": [128, 128],
"dtype": "|u1",
"compressor": {
"id": "zstd",
"level": 5
}
}
}, create=True, delete_existing=True).result()
arr.write(1).result()
# open with tensorstore
print(f"Opening {zarr_path} with tensorstore")
arr2 = ts.open({
"driver": "zarr",
"kvstore": {
"driver": "file",
"path": zarr_path
}
}).result()
# read first chunk with tensorstore
print(f"Reading first chunk with tensorstore")
print(arr2[:128,:128].read().result())
# open with zarr-python
print(f"Opening {zarr_path} with zarr-python")
arr3 = zarr.open(zarr_path)
# read first chunk with zarr-python
print(f"Reading the first chunk with zarr-python")
print(arr3[:128,:128])
# File "numcodecs/zstd.pyx", line 184, in numcodecs.zstd.decompress
# RuntimeError: Zstd decompression error: invalid input data Output
pixi.toml[project]
name = "reproducer"
version = "0.1.0"
description = "Add a short description here"
authors = ["Mark Kittisopikul <markkitt@gmail.com>"]
channels = ["conda-forge"]
platforms = ["linux-64"]
[tasks]
[dependencies]
zarr = ">=3.0.2,<4"
tensorstore = ">=0.1.65,<0.2" |
Non-reproductionThe problem does not occur if Tensorstore writes a Zarr v3 array because the frame content header contains a known frame size. import zarr
import tensorstore as ts
zarr_path = "nonreproduce_zarr-python_issue_2056.zarr"
arr = ts.open({
"driver": "zarr3",
"kvstore": {
"driver": "file",
"path": zarr_path
},
"metadata": {
"shape": [1024, 1024],
"chunk_grid": {
"name": "regular",
"configuration": {
"chunk_shape": [128, 128]
}
},
"data_type": "uint8",
"codecs": [{
"name": "zstd",
"configuration": {
"level": 5
}
}]
}
}, create=True, delete_existing=True).result()
arr.write(1).result()
# open with tensorstore
print(f"Opening {zarr_path} with tensorstore")
arr2 = ts.open({
"driver": "zarr3",
"kvstore": {
"driver": "file",
"path": zarr_path
}
}).result()
# read first chunk with tensorstore
print(f"Reading first chunk with tensorstore")
print(arr2[:128,:128].read().result())
# open with zarr-python
print(f"Opening {zarr_path} with zarr-python")
arr3 = zarr.open(zarr_path)
# read first chunk with zarr-python
print(f"Reading the first chunk with zarr-python")
print(arr3[:128,:128]) Output
|
One indication of the difference between the reproducer and non-reproducer is inforamtion about the compressed file from the zstd command line utility. The
Note that the command line utility can decompress either.
|
Zarr version
v2.18.2
Numcodecs version
v0.12.1
Python Version
3.12.4
Operating System
Linux
Installation
using conda
Description
I get the following error when trying to open a dataset compressed with tensorstore using the zstd compressor.
Steps to reproduce
Additional output
xref: google/tensorstore#182
The text was updated successfully, but these errors were encountered: