Virtual Zarr Support #11

alxmrs · 2024-06-17T12:00:32Z

This is quite a catalog of weather data! Congrats, I’m really impressed.

If you wanted this catalog to be truly overpowered (e.g. useful to the weather researchers, geospatial scientists, etc), I recommend finding a way to make all this data accessible via Zarr. With Zarr V3 around the corner, you should be able to add a few metadata files at the top of each bucket to make everything Zarr-accessible (in a language agnostic way). Here are a few pointers to get started:

https://github.com/zarr-developers/VirtualiZarr (recommended approach)
https://fsspec.github.io/kerchunk/
https://pangeo-forge.readthedocs.io/en/latest/

Patrick, please reach out to me over email — I’d love to collaborate with you on what you’re building. (al(at)merose(dot)com).

patrick-zippenfenig · 2024-06-18T10:42:48Z

Hi @alxmrs! Thanks for sharing.

I am not sure how feasible it is to access om files from this data repository directly. The file format is highly specific for fast and efficient gridded time-series storage. For improved performance Open-Meteo is using the Swift programming language with bindings to C code.

It would be feasible to write client libraries for other programming languages, but the om file format is not intended as a general purpose format. I am thinking about some extensions to make it work more generic (more data dimensions, metadata attributes), but it is still very domain specific.

For VirtualiZarr, is this a server side implementation to return data in chunks (Like Apache Arrow Feather)? Or is the general idea to read om files directly in Python?

Additionally, this data repository requires meta attributes which are hard-coded in the primary Open-Meteo repository. E.g. Information about the data grid, time resolution and length of each time-chunk. This would be easy to expose as a JSON file.

alxmrs · 2024-06-23T10:00:00Z

Hey Patrick,

Thanks for your response! A few thoughts:

I am not sure how feasible it is to access om files from this data repository directly.

It may not be feasible... yet. But, I think it could be. The beauty of Zarr is that it's more of an array protocol than a file format. Given a few modifications -- namely ZEP003, I think it will be possible to directly read om data from clients (across various languages).

In this scenario, VirtualiZarr would be run up-front in a batch setting to provide metadata files to a bucket somewhere (say, this open data on S3). From there, Zarr-clients should be able to read the files directly. And, ideally, folks would be able to read om files directly in Python.

I'm happy to hear that the open-meteo constants can be exposed in JSON; that makes me suspect such an integration with Zarr is more feasible.

the om file format is not intended as a general purpose format

That may be. But, I think you've addressed a really important access pattern that I suspect folks in the Zarr community would want to integrate with (xref: google-research/arco-era5#12).

One major benefit I see with having om be Zarr-readable is that it would reorganizing the data for different access patterns a matter of rechunking, which is well understood.

alxmrs mentioned this issue Jun 17, 2024

Support the Open Meteo custom data format fsspec/kerchunk#464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virtual Zarr Support #11

Virtual Zarr Support #11

alxmrs commented Jun 17, 2024 •

edited

Loading

patrick-zippenfenig commented Jun 18, 2024

alxmrs commented Jun 23, 2024

Virtual Zarr Support #11

Virtual Zarr Support #11

Comments

alxmrs commented Jun 17, 2024 • edited Loading

patrick-zippenfenig commented Jun 18, 2024

alxmrs commented Jun 23, 2024

alxmrs commented Jun 17, 2024 •

edited

Loading