Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

zarr does not like attrs added by intake-esm #324

Closed
jbusecke opened this issue Feb 18, 2021 · 2 comments · Fixed by #509
Closed

zarr does not like attrs added by intake-esm #324

jbusecke opened this issue Feb 18, 2021 · 2 comments · Fixed by #509
Labels
bug Issues that present a reasonable conviction there is a reproducible bug. good first issue Good for newcomers

Comments

@jbusecke
Copy link
Contributor

Intake-esm adds the attribute intake_esm_varname to datasets, and I have encountered cases where that ends up being None (still looking for the exact model).

Zarr does not like that type of metadata:

import xarray as xr
ds_test = xr.DataArray(5).to_dataset(name='test')
ds_test.attrs['test'] = None

ds_test.to_zarr('test.zarr')

gives

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-54ff2a1799bc> in <module>
      3 ds_test.attrs['test'] = None
      4 
----> 5 ds_test.to_zarr('test.zarr')

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region)
   1743             encoding = {}
   1744 
-> 1745         return to_zarr(
   1746             self,
   1747             store=store,

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region)
   1453     # validate Dataset keys, DataArray names, and attr keys/values
   1454     _validate_dataset_names(dataset)
-> 1455     _validate_attrs(dataset)
   1456 
   1457     if mode == "a":

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/backends/api.py in _validate_attrs(dataset)
    237     # Check attrs on the dataset itself
    238     for k, v in dataset.attrs.items():
--> 239         check_attr(k, v)
    240 
    241     # Check attrs on each variable within the dataset

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/backends/api.py in check_attr(name, value)
    228 
    229         if not isinstance(value, (str, Number, np.ndarray, np.number, list, tuple)):
--> 230             raise TypeError(
    231                 f"Invalid value for attr {name!r}: {value!r} must be a number, "
    232                 "a string, an ndarray or a list/tuple of "

TypeError: Invalid value for attr 'test': None must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files

Should the attribute be set to a string 'none' instead?

@andersy005 andersy005 added the bug Issues that present a reasonable conviction there is a reproducible bug. label Feb 18, 2021
@andersy005
Copy link
Member

Good catch. We could add some validators or just remove the custom metadata once were are done with assembling the datasets of interest.

@andersy005
Copy link
Member

The easiest solution is to remove the custom metadata info used by intake-esm. These can be removed right before returning the dataset in these two locations:

)
ds.attrs['intake_esm_dataset_key'] = self.key
self._ds = ds
return ds

ds.attrs['intake_esm_dataset_key'] = self.key
self._ds = ds
return ds

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Issues that present a reasonable conviction there is a reproducible bug. good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants