Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

formalize old and new styles of json serialization #686

Open
d-v-b opened this issue Jan 9, 2025 · 0 comments
Open

formalize old and new styles of json serialization #686

d-v-b opened this issue Jan 9, 2025 · 0 comments

Comments

@d-v-b
Copy link
Contributor

d-v-b commented Jan 9, 2025

The structure of JSON serialization used by most of the codecs in this repo is based on the following requirement from the zarr v2 spec:

compressor
A JSON object identifying the primary compression codec and providing
configuration parameters, or null if no compressor is to be used.
The object MUST contain an "id" key identifying the codec to be used.

This leads to JSON like {"id": "gzip", "level": 1}.

Putting the name of the codec in the same namespace as the codec parameters has a drawback -- no codec can have a parameter named id. An improvement would be to separate the codec name from the codec configuration into two separate namespaces, e.g. {"name": "codec_name", "configuration": {"name": "config_name", "param0: 0, ...}}.

This is the style used by the codecs defined along with the zarr v3 spec, and defined in the zarr v3 spec.

I would propose that we formalize these two representations in this library, and ensure that the data encoding / decoding behavior of codec is abstracted away from this JSON encoding difference.

  • Every codec should be able to serialize to either the zarr v2 or zarr v3 style, maybe by adding a suitable keyword argument to get_config.
  • Every codec should be able to deserialize from either the zarr 2 or zarr 3 json styles. This actually doesn't require any changes to from_config, but upstream code will need to know how to extract the config from the two different JSON flavors.

This is not required by the above stuff, but it would also be great to publish types for each codec in the form of TypedDict instances.

These changes would go some of the way towards allowing us to remove the zarr3 specific code that's in numcodecs today.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant