-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Mixed formats in catalog #414
Comments
👍🏽 . The current behavior is definitely a bug which was introduced in #368. We should be able to alter Lines 43 to 46 in e24e39c
I'm happy to look into this unless you want to work on it :) |
I have time to work on this, I can try to come up with a PR quite soon. |
Awesome! I think the necessary changes are straightforward:
|
Is your feature request related to a problem? Please describe.
I work at a climate service provider. We do most of our work on local servers and a lot of the data is local. We want to use
intake_esm
to handle these in-house datasets, at two levels : the organisation-level and the project-level. Having a somewhat clean catalog at the project level is possible, but it is a complicated task at the org-level : i want to include a maximum of what we have in a single catalog and it's messy.This means both netCDFs and zarr, as we are trying to move from the first to the second, but it's evidently impossible to completely abandon netCDF.
Currently, "assets.format" is mandatory and must be a single string ("netcdf" or "zarr"). The catalog and DataSource logic implies all url/paths use the same xarray "engine".
Describe the solution you'd like
The esm-collection-spec has "assets.format_column_name" and it says :" The column name which contains the data format, allowing for variable data types in one catalog. Mutually exclusive with format."
I'd like this to be reflected in
intake_esm
.Describe alternatives you've considered
Having 2 catalogs. Or implementing some messy workaround in a custom subclass of
esm_datastore
.Additional context
I can suggest a PR. My idea is the following:
Assets.format
optional. AddNone
as a valid enum item ofDataFormat
.format_column_name
toESMDataSource
and pass it inesm_datastore.__getitem__
.ESMDataSource.__init__
: Ifdata_format
is not None, set a new "_data_format" column to that, if it is None, rename theformat_column_name
to "_data_format".ESMDataSource._open_dataset
.That's the most elegant way I could think of, but I am open to suggestion if you are ok with this addition.
The text was updated successfully, but these errors were encountered: