-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Reading multiple ICESat-2 ATL11 point cloud data nicely via Zarr #100
Comments
Putting down some notes on a potential
Just some things to play with once I get some free time 🙂 |
Awesome! Regarding the |
We're actually working on some benchmarks over in that repo (e.g. ICESAT-2HackWeek/h5cloud#9), and the |
Gathering some notes on how best to read multiple ICESat-2 ATL11 data (basically a point cloud) in a user friendly way, with metadata preserved!
TLDR: Be able to do
xr.open_mfdataset("ATL11_*.h5", engine="zarr", ...)
.Inspired by the blog post "Cloud-Performant NetCDF4/HDF5 Reading with the Zarr Library". Zarr is an amazing project, and I really like the
.zmetadata
json file which can be opened with a text editor and tell you stuff about the data. The dream would be to read HDF5 files in an out-of-core manner with Zarr like speed/abilities (through the.zmetadata
pointer).Jupyter notebook demo can be found at https://github.com/rsignell-usgs/hurricane-ike-water-levels/blob/master/coawst_3ways.ipynb. See also discussion thread at zarr-developers/zarr-python#535 on "Using the Zarr library to read HDF5".
Main hurdles to get through, dependent on upstream, there's two 'separate' parts:
chunk_store
argument to use Zarr to read HDF5 - wait for Allow chunk_store argument when opening Zarr datasets pydata/xarray#3804xr.open_mfdataset
- wait for Xarray open_mfdataset with engine Zarr pydata/xarray#4187 / xarray.open_mzar: open multiple zarr files (in parallel) pydata/xarray#4003intake.open_ndzarr
will break with the above ☝️ - wait for xarray.open_zarr to be deprecated intake/intake-xarray#70Current situation in that I do HDF5 -> Zarr conversion, and read from that. It would be nice to stick to the original HDF5 data source (though I might need to flatten the nested ICESat-2 ATL11 data structure). Note that I'm not necessarily after raw speed, I just prefer readability (i.e. having xarray's wonderful annotated metadata).
Other open Issues/Pull Requests:
Blog posts:
You can tell I had way too many tabs open on my browser 😆
The text was updated successfully, but these errors were encountered: