-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
dask.optimize on xarray objects #3698
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
It looks like xarray is getting a bad task graph after the optimize. In [1]: import xarray as xr
import dask
In [2]: import dask
In [3]: a = dask.array.ones((10,5), chunks=(1,3))
...: a = dask.optimize(a)[0]
In [4]: da = xr.DataArray(a.compute()).chunk({"dim_0": 5})
...: da = dask.optimize(da)[0]
In [5]: dict(da.__dask_graph__())
Out[5]:
{('xarray-<this-array>-e2865aa10d476e027154771611541f99',
1,
0): (<function _operator.getitem(a, b, /)>, 'xarray-<this-array>-e2865aa10d476e027154771611541f99', (slice(5, 10, None),
slice(0, 5, None))),
('xarray-<this-array>-e2865aa10d476e027154771611541f99',
0,
0): (<function _operator.getitem(a, b, /)>, 'xarray-<this-array>-e2865aa10d476e027154771611541f99', (slice(0, 5, None),
slice(0, 5, None)))} Notice that are references to If we manually insert that, you'll see things work In [9]: dsk['xarray-<this-array>-e2865aa10d476e027154771611541f99'] = da._to_temp_dataset()[xr.core.dataarray._THIS_ARRAY]
In [11]: dask.get(dsk, keys=[('xarray-<this-array>-e2865aa10d476e027154771611541f99', 1, 0)])
Out[11]:
(<xarray.DataArray <this-array> (dim_0: 5, dim_1: 5)>
dask.array<getitem, shape=(5, 5), dtype=float64, chunksize=(5, 5), chunktype=numpy.ndarray>
Dimensions without coordinates: dim_0, dim_1,) |
FYI, @dcherian your recent PR to dask fixed this example. Playing around with chunk sizes, it seems to have fixed it even when the chunk size exceeds |
I guess I can see that. Thanks Tom.
FYI the slicing behaviour is independent of chunk-size (matt's recommendation). |
The numpy example is fixed but the dask rechunked example is still broken. a = dask.array.ones((10,5), chunks=(1,3))
dask.optimize(xr.DataArray(a))[0].compute() # works
dask.optimize(xr.DataArray(a).chunk(5))[0].compute() # error
|
Thanks for confirming. I'll take another look at this today then.
…On Thu, Sep 10, 2020 at 10:30 AM Deepak Cherian ***@***.***> wrote:
Reopened #3698 <#3698>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3698 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIT6LDBKVUQ5KR7VFB3SFDWI3ANCNFSM4KHH63GQ>
.
|
Previously we generated in invalidate Dask task graph, becuase the lines removed here dropped keys that were referenced elsewhere in the task graph. The original implementation had a comment indicating that this was to cull: https://github.com/pydata/xarray/blame/502a988ad5b87b9f3aeec3033bf55c71272e1053/xarray/core/variable.py#L384 Just spot-checking things, I think we're OK here though. Something like `dask.visualize(arr[[0]], optimize_graph=True)` indicates that we're OK. Closes pydata#3698
Previously we generated in invalidate Dask task graph, becuase the lines removed here dropped keys that were referenced elsewhere in the task graph. The original implementation had a comment indicating that this was to cull: https://github.com/pydata/xarray/blame/502a988ad5b87b9f3aeec3033bf55c71272e1053/xarray/core/variable.py#L384 Just spot-checking things, I think we're OK here though. Something like `dask.visualize(arr[[0]], optimize_graph=True)` indicates that we're OK. Closes pydata#3698
Previously we generated in invalidate Dask task graph, becuase the lines removed here dropped keys that were referenced elsewhere in the task graph. The original implementation had a comment indicating that this was to cull: https://github.com/pydata/xarray/blame/502a988ad5b87b9f3aeec3033bf55c71272e1053/xarray/core/variable.py#L384 Just spot-checking things, I think we're OK here though. Something like `dask.visualize(arr[[0]], optimize_graph=True)` indicates that we're OK. Closes #3698 Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Another attempt to fix pydata#3698. The issue with my fix in is that we hit `Variable._dask_finalize` in both `dask.optimize` and `dask.persist`. We want to do the culling of unnecessary tasks (`test_persist_Dataset`) but only in the persist case, not optimize (`test_optimize`).
Another attempt to fix pydata#3698. The issue with my fix in is that we hit `Variable._dask_finalize` in both `dask.optimize` and `dask.persist`. We want to do the culling of unnecessary tasks (`test_persist_Dataset`) but only in the persist case, not optimize (`test_optimize`).
* Fixed dask.optimize on datasets Another attempt to fix #3698. The issue with my fix in is that we hit `Variable._dask_finalize` in both `dask.optimize` and `dask.persist`. We want to do the culling of unnecessary tasks (`test_persist_Dataset`) but only in the persist case, not optimize (`test_optimize`). * Update whats-new.rst * Update doc/whats-new.rst Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
I am trying to call
dask.optimize
on a xarray object before the graph gets too big. But get weird errors. Simple examples below. All examples work if I remove thedask.optimize
step.cc @mrocklin @shoyer
This works with dask arrays:
It works when a dataArray is constructed using a dask array
but fails when creating a DataArray with a numpy array and then chunking it
🤷♂️
fails with error
And a different error when rechunking a dask-backed DataArray
The text was updated successfully, but these errors were encountered: