Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Allow chunk_store argument when opening Zarr datasets #3804

Merged
merged 11 commits into from
Aug 25, 2020
Merged

Allow chunk_store argument when opening Zarr datasets #3804

merged 11 commits into from
Aug 25, 2020

Conversation

ajelenak
Copy link
Contributor

This is a simple modification to pass chunk_store argument to the zarr package when opening Zarr datasets.

I don't think this requires any documentation changes and new tests.

@rsignell-usgs
Copy link

This PR is motivated by the work described in this Medium blog post

@rabernat
Copy link
Contributor

rabernat commented Feb 27, 2020

Fantastic, thanks @ajelenak so much for this contribution!

Two items come to mind:

  • this will definitely need a test, just something simple to verify that the chunk_store argument works as expected
  • curious what will happen if you pass an actual Zarr storage object. I assume it will work fine, since these are all mutable mappings.

@ajelenak
Copy link
Contributor Author

@rabernat Added a test that uses an actual Zarr store.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the update.

The new test is unfortunately failing with the error

E       TypeError: open() got multiple values for keyword argument 'chunk_store'

Could you look into this?

As noted in comment below, this test also makes me wonder whether we want a chunk_store argument in to_zarr as well. It might make sense to do them both as part of this PR.

@rabernat
Copy link
Contributor

Sorry for letting this hang so long. Now that zarr-developers/zarr-python#557 has been merged and zarr 2.4.0 has been released, we should be able to finish this off.

The easiest thing would be to just require zarr > 2.4.0. However, we currently don't require zarr at all...it's an optional dependency. The code in this PR will fail for older versions of zarr. What's the best way to handle this? Having a bunch of internal checks for zarr versions feels clunky, but we've done it before.

@dcherian
Copy link
Contributor

It'd be nice to add some documentation to https://xarray.pydata.org/en/stable/io.html but that can happen in a future PR.

@rsignell-usgs
Copy link

rsignell-usgs commented Jul 28, 2020

@dcherian , are we just waiting for one more "+1" here, or are the failing checks related to this PR?

dcherian added 2 commits July 28, 2020 12:33
* upstream/master: (36 commits)
  enable fail_on_warning for RTD (#4269)
  update the list item numbers in the release guide (#4264)
  remove the compatibility code in duck_array_ops.allclose_or_equiv (#4270)
  Un-xfail cftime plotting tests (#4272)
  xfail failing upstream plotting tests (#4271)
  Improve some error messages: apply_ufunc & set_options. (#4259)
  Support cupy in as_shared_dtype (#4232)
  Fix DataArray.copy documentation: remove confusing mention of 'dataset' (Gh3606) (#4245)
  Removed skipna argument from count, any, all [GH755] (#4258)
  Added xarrays-spatial and updated geoviews link (#4262)
  update docs to point to xarray-contrib and xarray-tutorial (#4252)
  Add release summary, some touch-ups (#4217)
  CFTimeIndex calendar in repr (#4092)
  fix the RTD timeouts (#4254)
  update isort CI and pre-commit hook (#4204)
  Add initial cupy tests (#4214)
  Add 0.16.0 release summary
  New whatsnew section
  Release v0.16.0
  Minor reorg of whatsnew for 0.16.0 (#4216)
  ...
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM as long as tests pass.

@rabernat?

@jhamman jhamman closed this Aug 5, 2020
@jhamman jhamman reopened this Aug 5, 2020
Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cycled the CI to (hopefully) fix the failing test. In the meantime, two other comments popped out at me.

@dcherian dcherian added topic-backends topic-zarr Related to zarr storage library labels Aug 14, 2020
…hunkstore

 Fixed conflicts in:
• xarray/core/dataset.py
• xarray/tests/test_sparse.py
@ajelenak
Copy link
Contributor Author

Is there anything else to be done in this PR, or is it ready for merge? All tests pass.

@dcherian
Copy link
Contributor

Looks good to me. I will merge in a couple of days if no one else has any comments.

@rsignell-usgs
Copy link

rsignell-usgs commented Aug 25, 2020

Drumroll.... @dcherian, epic cymbal crash?

@dcherian
Copy link
Contributor

Thanks for being patient @ajelenak I am excited to see this go in. welcome to xarray!

@dcherian dcherian merged commit 9c85dd5 into pydata:master Aug 25, 2020
@ajelenak
Copy link
Contributor Author

Thanks @dcherian . And thanks to all who provided suggestions and comments in this PR.

@ajelenak ajelenak deleted the zarr-chunkstore branch August 25, 2020 18:52
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
topic-backends topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants