Skip to content

pint support for Dataset #3975

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 60 commits into from
Jun 17, 2020
Merged

pint support for Dataset #3975

merged 60 commits into from
Jun 17, 2020

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Apr 15, 2020

This is part of the effort to add support for pint (see #3594) to Dataset objects (although it will probably be a test-only PR, just like #3643).

  • Tests added
  • Passes isort -rc . && black . && mypy . && flake8
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

The list of failing tests from #3594:

  • Dataset methods
    • __init__: Needs unit support in IndexVariable, and merge does not work yet (test bug is also possible)
    • aggregation: xarray does not implement __array_function__ (see running numpy functions on xarray objects #3917)
    • rank: depends on bottleneck and thus only works with numpy.array
    • ffill, bfill: uses bottleneck
    • interpolate_na: uses numpy.vectorize, which does not support NEP-18, yet
    • equals, identical: works (but no units / unit checking in IndexVariable)
    • broadcast_like: works (but no units / unit checking in IndexVariable)
    • to_stacked_array: no units in IndexVariable
    • sel, loc: no units in IndexVariable
    • interp, reindex: partially blocked by IndexVariable. reindex works with units in data, but interp uses scipy
    • interp_like, reindex_like: same as interp / reindex
    • quantile: works, but needs pint >= 0.12
    • groupby_bins: needs pint >= 0.12 (for isclose)
    • rolling: uses numpy.lib.stride_tricks.as_strided
    • rolling_exp: uses numbagg (supports NEP-18, but pint doesn't support its functions)

@keewis keewis changed the title pint support for datasets pint support for Dataset Apr 15, 2020
@dcherian dcherian mentioned this pull request May 5, 2020
23 tasks
@keewis
Copy link
Collaborator Author

keewis commented May 27, 2020

it seems assert_allclose was one of the sources of UnitStrippedWarnings, but since there's a bug in pint's isclose (fixed on master) the tests fail now.

Edit: pint will be released in the next few days, so most of the failing CI should pass after that.

Also, because it casted to numpy, there were a few bugs that were hidden.

@dcherian
Copy link
Contributor

Thanks for working on this @keewis . Since there are no changes outside test_units.py, I think you should merge whenever you think this is ready.

@keewis
Copy link
Collaborator Author

keewis commented Jun 17, 2020

I just found another issue: pint implements prod (but not yet nanprod) so the prod tests could be un-xfailed. However, we define a custom nanprod function that uses where to replace nan with 1. This won't work on quantities since, unlike nan and 0, a bare 1 cannot be put into quantities with a dimension (i.e. with a unit other than dimensionless).

I don't really understand the purpose of nanprod's min_count parameter (and _maybe_null_out) so I'm not sure how to fix that.

For now, I think we can merge this PR on green and I'll add that issue to the list in #3594.

@dcherian
Copy link
Contributor

Looks like an incompatibility with latest pandas

____________________ TestDataset.test_resample[int-coords] _____________________

self = <xarray.tests.test_units.TestDataset object at 0x7fa84fb9e490>
variant = 'coords', dtype = <class 'int'>
    return func(*all_args, **all_kwargs)
xarray/core/common.py:1123: in resample
    grouper = pd.Grouper(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'pandas.core.groupby.grouper.Grouper'>, args = ()
kwargs = {'base': 0, 'closed': None, 'freq': '6m', 'label': None, ...}
TimeGrouper = <class 'pandas.core.resample.TimeGrouper'>, stacklevel = 2

    def __new__(cls, *args, **kwargs):
        if kwargs.get("freq") is not None:
            from pandas.core.resample import TimeGrouper
    
            # Deprecation warning of `base` and `loffset` since v1.1.0:
            # we are raising the warning here to be able to set the `stacklevel`
            # properly since we need to raise the `base` and `loffset` deprecation
            # warning from three different cases:
            #   core/generic.py::NDFrame.resample
            #   core/groupby/groupby.py::GroupBy.resample
            #   core/groupby/grouper.py::Grouper
            # raising these warnings from TimeGrouper directly would fail the test:
            #   tests/resample/test_deprecated.py::test_deprecating_on_loffset_and_base
    
            # hacky way to set the stacklevel: if cls is TimeGrouper it means
            # that the call comes from a pandas internal call of resample,
            # otherwise it comes from pd.Grouper
            stacklevel = 4 if cls is TimeGrouper else 2
            if kwargs.get("base", None) is not None:
>               warnings.warn(
                    "'base' in .resample() and in Grouper() is deprecated.\n"
                    "The new arguments that you should use are 'offset' or 'origin'.\n"
                    '\n>>> df.resample(freq="3s", base=2)\n'
                    "\nbecomes:\n"
                    '\n>>> df.resample(freq="3s", offset="2s")\n',
                    FutureWarning,
                    stacklevel=stacklevel,
                )
E               FutureWarning: 'base' in .resample() and in Grouper() is deprecated.
E               The new arguments that you should use are 'offset' or 'origin'.
E               
E               >>> df.resample(freq="3s", base=2)
E               
E               becomes:
E               
E               >>> df.resample(freq="3s", offset="2s")

@keewis
Copy link
Collaborator Author

keewis commented Jun 17, 2020

yeah, I don't know how to only filter for pint warnings, I tried pytest.mark.filterwarnings("error:::pint[.*]") but that doesn't work

Edit: pytest.mark.filterwarnings("error::pint.UnitStrippedWarning") works so I'm merging.

@keewis keewis merged commit 66e7730 into pydata:master Jun 17, 2020
@keewis keewis deleted the pint-support-dataset branch June 17, 2020 20:40
@keewis keewis mentioned this pull request Jun 17, 2020
18 tasks
dcherian added a commit to TomNicholas/xarray that referenced this pull request Jun 24, 2020
…o-combine

* 'master' of github.com:pydata/xarray: (81 commits)
  use builtin python types instead of the numpy alias (pydata#4170)
  Revise pull request template (pydata#4039)
  pint support for Dataset (pydata#3975)
  drop eccodes in docs (pydata#4162)
  Update issue templates inspired/based on dask (pydata#4154)
  Fix failing upstream-dev build & remove docs build (pydata#4160)
  Improve typehints of xr.Dataset.__getitem__ (pydata#4144)
  provide a error summary for assert_allclose (pydata#3847)
  built-in accessor documentation (pydata#3988)
  Recommend installing cftime when time decoding fails. (pydata#4134)
  parameter documentation for DataArray.sel (pydata#4150)
  speed up map_blocks (pydata#4149)
  Remove outdated note from datetime accessor docstring (pydata#4148)
  Fix the upstream-dev pandas build failure (pydata#4138)
  map_blocks: Allow passing dask-backed objects in args (pydata#3818)
  keep attrs in reset_index (pydata#4103)
  Fix open_rasterio() for WarpedVRT with specified src_crs (pydata#4104)
  Allow non-unique and non-monotonic coordinates in get_clean_interp_index and polyfit (pydata#4099)
  update numpy's intersphinx url (pydata#4117)
  xr.infer_freq (pydata#4033)
  ...
dcherian added a commit to raphaeldussin/xarray that referenced this pull request Jul 1, 2020
* upstream/master: (21 commits)
  fix typo in error message in plot.py (pydata#4188)
  Support multiple dimensions in DataArray.argmin() and DataArray.argmax() methods (pydata#3936)
  Show data by default in HTML repr for DataArray (pydata#4182)
  Blackdoc (pydata#4177)
  Add CONTRIBUTING.md for the benefit of GitHub
  Correct dask handling for 1D idxmax/min on ND data (pydata#4135)
  use assert_allclose in the aggregation-with-units tests (pydata#4174)
  Remove old auto combine (pydata#3926)
  Fix 4009 (pydata#4173)
  Limit length of dataarray reprs (pydata#3905)
  Remove <pre> from nested HTML repr (pydata#4171)
  Proposal for better error message about in-place operation (pydata#3976)
  use builtin python types instead of the numpy alias (pydata#4170)
  Revise pull request template (pydata#4039)
  pint support for Dataset (pydata#3975)
  drop eccodes in docs (pydata#4162)
  Update issue templates inspired/based on dask (pydata#4154)
  Fix failing upstream-dev build & remove docs build (pydata#4160)
  Improve typehints of xr.Dataset.__getitem__ (pydata#4144)
  provide a error summary for assert_allclose (pydata#3847)
  ...
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants