You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The structure of JSON serialization used by most of the codecs in this repo is based on the following requirement from the zarr v2 spec:
compressor
A JSON object identifying the primary compression codec and providing
configuration parameters, or null if no compressor is to be used.
The object MUST contain an "id" key identifying the codec to be used.
This leads to JSON like {"id": "gzip", "level": 1}.
Putting the name of the codec in the same namespace as the codec parameters has a drawback -- no codec can have a parameter named id. An improvement would be to separate the codec name from the codec configuration into two separate namespaces, e.g. {"name": "codec_name", "configuration": {"name": "config_name", "param0: 0, ...}}.
This is the style used by the codecs defined along with the zarr v3 spec, and defined in the zarr v3 spec.
I would propose that we formalize these two representations in this library, and ensure that the data encoding / decoding behavior of codec is abstracted away from this JSON encoding difference.
Every codec should be able to serialize to either the zarr v2 or zarr v3 style, maybe by adding a suitable keyword argument to get_config.
Every codec should be able to deserialize from either the zarr 2 or zarr 3 json styles. This actually doesn't require any changes to from_config, but upstream code will need to know how to extract the config from the two different JSON flavors.
This is not required by the above stuff, but it would also be great to publish types for each codec in the form of TypedDict instances.
These changes would go some of the way towards allowing us to remove the zarr3 specific code that's in numcodecs today.
The text was updated successfully, but these errors were encountered:
The structure of JSON serialization used by most of the codecs in this repo is based on the following requirement from the zarr v2 spec:
This leads to JSON like
{"id": "gzip", "level": 1}
.Putting the name of the codec in the same namespace as the codec parameters has a drawback -- no codec can have a parameter named
id
. An improvement would be to separate the codec name from the codec configuration into two separate namespaces, e.g.{"name": "codec_name", "configuration": {"name": "config_name", "param0: 0, ...}}
.This is the style used by the codecs defined along with the zarr v3 spec, and defined in the zarr v3 spec.
I would propose that we formalize these two representations in this library, and ensure that the data encoding / decoding behavior of codec is abstracted away from this JSON encoding difference.
get_config
.from_config
, but upstream code will need to know how to extract the config from the two different JSON flavors.This is not required by the above stuff, but it would also be great to publish types for each codec in the form of
TypedDict
instances.These changes would go some of the way towards allowing us to remove the zarr3 specific code that's in numcodecs today.
The text was updated successfully, but these errors were encountered: