-
Notifications
You must be signed in to change notification settings - Fork 97
zstd failure for unsepcified buffer size #424
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
FWIW I have now run into this twice with other
import zarr
import zarrs
import numpy as np
default_pipeline = zarr.config.get("codec_pipeline.path")
with zarr.config.set({"codec_pipeline.path": "zarrs.ZarrsCodecPipeline"}):
memory_array = np.array(25)
z = zarr.create_array("foo.zarr", shape=memory_array.shape, dtype=memory_array.dtype, compressors=[zarr.codecs.ZstdCodec()])
z[...] = memory_array
with zarr.config.set({"codec_pipeline.path": default_pipeline}):
z = zarr.open_array("foo.zarr")
z[...] The other one is the GPU implenetation from nvcomp 3.0 as implemented/tested in https://github.com/ilan-gold/cuda-zarr Env:
```
asciitree==0.3.3
asttokens==3.0.0
crc32c==2.7.1
decorator==5.2.1
deprecated==1.2.18
donfig==0.8.1.post1
executing==2.2.0
fasteners==0.19
fsspec==2025.2.0
iniconfig==2.0.0
ipython==8.32.0
jedi==0.19.2
matplotlib-inline==0.1.7
msgpack==1.1.0
numcodecs==0.15.1
numpy==2.2.3
packaging==24.2
parso==0.8.4
pexpect==4.9.0
pluggy==1.5.0
prompt-toolkit==3.0.50
ptyprocess==0.7.0
pure-eval==0.2.3
pygments==2.19.1
pytest==8.3.4
pyyaml==6.0.2
stack-data==0.6.3
traitlets==5.14.3
typing-extensions==4.12.2
universal-pathlib==0.2.6
wcwidth==0.2.13
wrapt==1.17.2
zarr==3.0.4
zarrs==0.1.3
zstandard==0.23.0
```
|
i think #707 fixes this, but we really should just depend on an external zstd. |
@d-v-b What would that look like? I am not so familiar with the space. |
i'm not familiar with this space either, but ideally numcodecs would list "zstd" explicitly as a dependency (required or not I'm not sure), and that dependency would resolve to a maintained source of zstd. https://github.com/indygreg/python-zstandard looks fine for our purposes. I don't think we need cramjam for this? |
No probably not! I think that library looks good as well. |
I'm very heavily in favour of depending on external pacakges instead of vendoring as a future direction for numcodecs. If will need doing carefully though - for If there's more than one of us pro going down this road, then I think we should just do it, but post on a slightly wider forum (open up and issue and ping zarr-python developers?) so others have oppurtunity to comment before we commit. |
No, we don't need cramjam, but it might be more convenient than grabbing the separate python packages for each compression algo (this is why fastparquet chose it). It might also be faster in some cases. |
@martindurant That does make sense, hadn't thought of that. It seems like it'd be all or nothing since |
Reported in kerchunk: fsspec/kerchunk#317 (comment) by @cgohlke
Minimal, reproducible code sample, a copy-pastable example if possible
Problem description
This means, that we can always decompress chunks written by our own Zstd, but. not necessarily those written by others. This is another case in which we could simply use cramjam or imagecodecs (or even zstandard) rather than building out own cython extension. In most cases, we will know the size of the decompressed data, because it must fit the output (but other filters in the chain may spoil this).
Version and installation information
Please provide the following:
The text was updated successfully, but these errors were encountered: