Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add intake_xarray_kwargs to ThreddsCatalog #52

Merged
merged 8 commits into from
Mar 18, 2022
Merged

Conversation

andersy005
Copy link
Member

@andersy005 andersy005 commented Sep 15, 2021

@andersy005
Copy link
Member Author

@raybellwaves, this is my attempt at addressing #51. I opted for intake_xarray_kwargs instead of xarray_kwargs so as to use it as a catch-all argument for all options that can be passed to the intake_xarray sources (such as https://github.com/intake/intake-xarray/blob/f1ca02d5c7734bb9de79074fe128ac5e7d598165/intake_xarray/netcdf.py#L48).

Right now it appears that things are broken due to how fsspec is interfering with the path passed to xarray's open_dataset.

In [1]: import intake

In [2]: cat_url = "https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/GFS_Global_0p25d
   ...: eg_20210913_1800.grib2/catalog.xml"

In [3]: catalog = intake.open_thredds_cat(cat_url, driver="netcdf", intake_xarray_kwargs={'xarray_kwargs'
   ...: : {'engine': "netcdf4"}})

In [4]: catalog = intake.open_thredds_cat(cat_url, driver="netcdf", intake_xarray_kwargs={'xarray_kwargs': {'engine': "netcdf4"}})

In [5]: source = catalog["GFS_Global_0p25deg_20210913_1800.grib2"]
In [7]: source.to_dask()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-75a159db6bd7> in <module>
----> 1 source.to_dask()

~/.mambaforge/envs/intake-thredds-dev/lib/python3.9/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

~/.mambaforge/envs/intake-thredds-dev/lib/python3.9/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

~/.mambaforge/envs/intake-thredds-dev/lib/python3.9/site-packages/intake/source/base.py in _load_metadata(self)
    234         """load metadata only if needed"""
    235         if self._schema is None:
--> 236             self._schema = self._get_schema()
    237             self.dtype = self._schema.dtype
    238             self.shape = self._schema.shape

~/.mambaforge/envs/intake-thredds-dev/lib/python3.9/site-packages/intake_xarray/base.py in _get_schema(self)
     16 
     17         if self._ds is None:
---> 18             self._open_dataset()
     19 
     20             metadata = {

~/.mambaforge/envs/intake-thredds-dev/lib/python3.9/site-packages/intake_xarray/netcdf.py in _open_dataset(self)
     90             url = fsspec.open(self.urlpath, **self.storage_options).open()
     91 
---> 92         self._ds = _open_dataset(url, chunks=self.chunks, **kwargs)
     93 
     94     def _add_path_to_ds(self, ds):

~/.mambaforge/envs/intake-thredds-dev/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    495 
    496     overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 497     backend_ds = backend.open_dataset(
    498         filename_or_obj,
    499         drop_variables=drop_variables,

~/.mambaforge/envs/intake-thredds-dev/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    549 
    550         filename_or_obj = _normalize_path(filename_or_obj)
--> 551         store = NetCDF4DataStore.open(
    552             filename_or_obj,
    553             mode=mode,

~/.mambaforge/envs/intake-thredds-dev/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    351 
    352         if not isinstance(filename, str):
--> 353             raise ValueError(
    354                 "can only read bytes or file-like objects "
    355                 "with engine='scipy' or 'h5netcdf'"

ValueError: can only read bytes or file-like objects with engine='scipy' or 'h5netcdf'

@andersy005 andersy005 added the enhancement New feature or request label Sep 15, 2021
tests/test_cat.py Outdated Show resolved Hide resolved
tests/test_cat.py Outdated Show resolved Hide resolved
@aaronspring
Copy link
Collaborator

Just fixed that one test. Can this PR be merged?

@aaronspring aaronspring self-requested a review March 17, 2022 23:06
@andersy005
Copy link
Member Author

Thank you, @aaronspring! Let's go ahead and merge this as is. If there is any issue, we can address it later.

@andersy005 andersy005 merged commit 5ec2660 into main Mar 18, 2022
@andersy005 andersy005 deleted the xarray-open-kwargs branch March 18, 2022 02:00
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add xarray_kwargs to ThreddsCatalog
2 participants