Skip to content

Added array optimimzation fuse notebook #89

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

TomAugspurger
Copy link
Member

@TomAugspurger TomAugspurger commented Jul 18, 2019

Copy link

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for writing this up, very useful!

"metadata": {},
"outputs": [],
"source": [
"inputs_rechunked.blocks[0, :2].visualize(optimize_graph=True)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice trick, didn't know about this :-)

@TomAugspurger
Copy link
Member Author

Thanks @alimanfoo, I've applied your suggestions.

@mrocklin do you have high-level thoughts on this? Does this feel like we're just documenting a workaround to a weakness of Dask that we should instead be fixing?

@mrocklin
Copy link
Member

mrocklin commented Jul 19, 2019 via email

@mrocklin
Copy link
Member

mrocklin commented Jul 19, 2019 via email

@martindurant
Copy link
Member

@TomAugspurger , did you have plans to try to make the story here more general?

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Jul 31, 2019 via email

@TomAugspurger
Copy link
Member Author

@mrocklin question on the HLG fusion: would you expect adding additional
operations to the end of a task graph (e.g. .store) to potentially result in
more fusion earlier on? My guess is that extra tasks won't lead to more fusion
earlier on, but I may be misreading fuse.

I ask because when I look at just the creation / stacking / rechunking, we don't
get fusion with the default parameters:

import dask.array as da

inputs = [da.random.random(size=500_000, chunks=90_000)
          for _ in range(5)]
inputs_stacked = da.vstack(inputs)
inputs_rechunked = inputs_stacked.rechunk((50, 90_000))
inputs_rechunked.visualize(optimize_graph=True)

image

So unless adding a .store() to the end results in more fusion earlier on (in
the creation / stacking / rechunking phase), we won't be solving this use-case.

Base automatically changed from master to main January 27, 2021 16:07
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants