-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Optimize SubtaskGraph generation #3342
base: master
Are you sure you want to change the base?
Optimize SubtaskGraph generation #3342
Conversation
c18d405
to
81244dc
Compare
491de95
to
8fb4545
Compare
# Note: `dtypes`, `index_value`, and `columns_value` are lazily | ||
# initialized, so we should call property `params` to initialize | ||
# these fields. | ||
[o.params for o in out_chunks] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's wired, what would happen without these codes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will no columns_value
, index_value
which are used in MainPool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These field are lazily initialized, but field a
or b
are lazily initialized by params
. Can you make a
initialized by a
, b
initialized by b
? Then we can lazily initialize them in Worker Main Pool.
What do these changes do?
In
gen_subtask_graph
, Mars always create new out chunks even if the out chunk already exists. It costs a lot of time if there are plenty of chunks.Related issue number
Fixes #3341
I did a comparison, in which one creates new out chunks and the other does not. The test scripts are:
Cost time of
Subtask
generation are: 122.92s, 56.63s.Check code requirements