-
-
Notifications
You must be signed in to change notification settings - Fork 225
Added array optimimzation fuse notebook #89
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for writing this up, very useful!
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"inputs_rechunked.blocks[0, :2].visualize(optimize_graph=True)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice trick, didn't know about this :-)
Thanks @alimanfoo, I've applied your suggestions. @mrocklin do you have high-level thoughts on this? Does this feel like we're just documenting a workaround to a weakness of Dask that we should instead be fixing? |
Yes, to me this notebook seems perhaps overly-specific to a single use
case. I'm having trouble finding ways to generalize this notebook to other
situations. I think that a general example of optimization would be
useful. There are plenty of cases where this comes up, such as in ML
workloads where you really want X and y to be co-allocated. That case
might also be a bit simpler.
…On Fri, Jul 19, 2019 at 1:40 PM Tom Augspurger ***@***.***> wrote:
Thanks @alimanfoo <https://github.com/alimanfoo>, I've applied your
suggestions.
@mrocklin <https://github.com/mrocklin> do you have high-level thoughts
on this? Does this feel like we're just documenting a workaround to a
weakness of Dask that we should instead be fixing?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#89?email_source=notifications&email_token=AACKZTE2TXB5RBUDJP3TPBTQAIRCRA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MWRHA#issuecomment-513370268>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACKZTHN7P4E3BPIS3OSTWLQAIRCRANCNFSM4IE3TV3A>
.
|
Although in general in many of these cases I think that we can improve them
just by expanding Blockwise and HighLevelGraph operator fusion out to data
access operations
…On Fri, Jul 19, 2019 at 3:15 PM Matthew Rocklin ***@***.***> wrote:
Yes, to me this notebook seems perhaps overly-specific to a single use
case. I'm having trouble finding ways to generalize this notebook to other
situations. I think that a general example of optimization would be
useful. There are plenty of cases where this comes up, such as in ML
workloads where you really want X and y to be co-allocated. That case
might also be a bit simpler.
On Fri, Jul 19, 2019 at 1:40 PM Tom Augspurger ***@***.***>
wrote:
> Thanks @alimanfoo <https://github.com/alimanfoo>, I've applied your
> suggestions.
>
> @mrocklin <https://github.com/mrocklin> do you have high-level thoughts
> on this? Does this feel like we're just documenting a workaround to a
> weakness of Dask that we should instead be fixing?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#89?email_source=notifications&email_token=AACKZTE2TXB5RBUDJP3TPBTQAIRCRA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MWRHA#issuecomment-513370268>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AACKZTHN7P4E3BPIS3OSTWLQAIRCRANCNFSM4IE3TV3A>
> .
>
|
@TomAugspurger , did you have plans to try to make the story here more general? |
Not at the moment.
…On Wed, Jul 31, 2019 at 2:00 PM Martin Durant ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> , did you have plans to
try to make the story here more general?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#89?email_source=notifications&email_token=AAKAOIXRNST4KKWHAAEYPJTQCHOLXA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IHNFQ#issuecomment-516978326>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITRDXUXDIRQKX5NCDDQCHOLXANCNFSM4IE3TV3A>
.
|
@mrocklin question on the HLG fusion: would you expect adding additional I ask because when I look at just the creation / stacking / rechunking, we don't import dask.array as da
inputs = [da.random.random(size=500_000, chunks=90_000)
for _ in range(5)]
inputs_stacked = da.vstack(inputs)
inputs_rechunked = inputs_stacked.rechunk((50, 90_000))
inputs_rechunked.visualize(optimize_graph=True) So unless adding a |
From dask/dask#5105.
https://mybinder.org/v2/gh/TomAugspurger/dask-examples/array-fuse (building an image now)