Added array optimimzation fuse notebook #89

TomAugspurger · 2019-07-18T14:09:11Z

From dask/dask#5105.

https://mybinder.org/v2/gh/TomAugspurger/dask-examples/array-fuse (building an image now)

alimanfoo

Thanks a lot for writing this up, very useful!

applications/array-optimization.ipynb

alimanfoo · 2019-07-18T22:29:22Z

applications/array-optimization.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "inputs_rechunked.blocks[0, :2].visualize(optimize_graph=True)"


Nice trick, didn't know about this :-)

applications/array-optimization.ipynb

TomAugspurger · 2019-07-19T20:40:07Z

Thanks @alimanfoo, I've applied your suggestions.

@mrocklin do you have high-level thoughts on this? Does this feel like we're just documenting a workaround to a weakness of Dask that we should instead be fixing?

mrocklin · 2019-07-19T22:16:03Z

Yes, to me this notebook seems perhaps overly-specific to a single use case. I'm having trouble finding ways to generalize this notebook to other situations. I think that a general example of optimization would be useful. There are plenty of cases where this comes up, such as in ML workloads where you really want X and y to be co-allocated. That case might also be a bit simpler.

…

On Fri, Jul 19, 2019 at 1:40 PM Tom Augspurger ***@***.***> wrote: Thanks @alimanfoo <https://github.com/alimanfoo>, I've applied your suggestions. @mrocklin <https://github.com/mrocklin> do you have high-level thoughts on this? Does this feel like we're just documenting a workaround to a weakness of Dask that we should instead be fixing? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#89?email_source=notifications&email_token=AACKZTE2TXB5RBUDJP3TPBTQAIRCRA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MWRHA#issuecomment-513370268>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACKZTHN7P4E3BPIS3OSTWLQAIRCRANCNFSM4IE3TV3A> .

mrocklin · 2019-07-19T22:16:41Z

Although in general in many of these cases I think that we can improve them just by expanding Blockwise and HighLevelGraph operator fusion out to data access operations

…

On Fri, Jul 19, 2019 at 3:15 PM Matthew Rocklin ***@***.***> wrote: Yes, to me this notebook seems perhaps overly-specific to a single use case. I'm having trouble finding ways to generalize this notebook to other situations. I think that a general example of optimization would be useful. There are plenty of cases where this comes up, such as in ML workloads where you really want X and y to be co-allocated. That case might also be a bit simpler. On Fri, Jul 19, 2019 at 1:40 PM Tom Augspurger ***@***.***> wrote: > Thanks @alimanfoo <https://github.com/alimanfoo>, I've applied your > suggestions. > > @mrocklin <https://github.com/mrocklin> do you have high-level thoughts > on this? Does this feel like we're just documenting a workaround to a > weakness of Dask that we should instead be fixing? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#89?email_source=notifications&email_token=AACKZTE2TXB5RBUDJP3TPBTQAIRCRA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MWRHA#issuecomment-513370268>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AACKZTHN7P4E3BPIS3OSTWLQAIRCRANCNFSM4IE3TV3A> > . >

martindurant · 2019-07-31T19:00:11Z

@TomAugspurger , did you have plans to try to make the story here more general?

TomAugspurger · 2019-07-31T19:06:33Z

Not at the moment.

…

On Wed, Jul 31, 2019 at 2:00 PM Martin Durant ***@***.***> wrote: @TomAugspurger <https://github.com/TomAugspurger> , did you have plans to try to make the story here more general? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#89?email_source=notifications&email_token=AAKAOIXRNST4KKWHAAEYPJTQCHOLXA5CNFSM4IE3TV3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3IHNFQ#issuecomment-516978326>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOITRDXUXDIRQKX5NCDDQCHOLXANCNFSM4IE3TV3A> .

TomAugspurger · 2019-08-01T15:49:43Z

@mrocklin question on the HLG fusion: would you expect adding additional
operations to the end of a task graph (e.g. .store) to potentially result in
more fusion earlier on? My guess is that extra tasks won't lead to more fusion
earlier on, but I may be misreading fuse.

I ask because when I look at just the creation / stacking / rechunking, we don't
get fusion with the default parameters:

import dask.array as da

inputs = [da.random.random(size=500_000, chunks=90_000)
          for _ in range(5)]
inputs_stacked = da.vstack(inputs)
inputs_rechunked = inputs_stacked.rechunk((50, 90_000))
inputs_rechunked.visualize(optimize_graph=True)

So unless adding a .store() to the end results in more fusion earlier on (in
the creation / stacking / rechunking phase), we won't be solving this use-case.

TomAugspurger added 3 commits July 18, 2019 09:08

Added array optimimzation fuse notebook

ce45bfd

install zarr

d3a0095

install zarr

f6199e0

TomAugspurger mentioned this pull request Jul 18, 2019

Managing memory use for a simple vstack/rechunk/store pipeline dask/dask#5105

Closed

add to index

97fda16

alimanfoo reviewed Jul 18, 2019

View reviewed changes

updates

4affee9

Base automatically changed from master to main January 27, 2021 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added array optimimzation fuse notebook #89

Added array optimimzation fuse notebook #89

TomAugspurger commented Jul 18, 2019 •

edited

Loading

alimanfoo left a comment

alimanfoo Jul 18, 2019

TomAugspurger commented Jul 19, 2019

mrocklin commented Jul 19, 2019 via email

mrocklin commented Jul 19, 2019 via email

martindurant commented Jul 31, 2019

TomAugspurger commented Jul 31, 2019 via email

TomAugspurger commented Aug 1, 2019

Added array optimimzation fuse notebook #89

Are you sure you want to change the base?

Added array optimimzation fuse notebook #89

Conversation

TomAugspurger commented Jul 18, 2019 • edited Loading

alimanfoo left a comment

Choose a reason for hiding this comment

alimanfoo Jul 18, 2019

Choose a reason for hiding this comment

TomAugspurger commented Jul 19, 2019

mrocklin commented Jul 19, 2019 via email

mrocklin commented Jul 19, 2019 via email

martindurant commented Jul 31, 2019

TomAugspurger commented Jul 31, 2019 via email

TomAugspurger commented Aug 1, 2019

TomAugspurger commented Jul 18, 2019 •

edited

Loading