-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Avoid copying (so much) for LogicalPlan::map_children #9946
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
LogicalPlan::map_children
dcdbe88
to
755342f
Compare
LogicalPlan::map_children
LogicalPlan::map_children
755342f
to
b4a9ffd
Compare
Ok, I think once #9913 from @peter-toth is merged this PR will be ready to review |
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
//! Methods for rewriting logical plans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peter-toth if you have time I would love to hear your thoughts on this change / API (no changes to TreeNode)
Sorry @alamb , I'm still working on my #9913. I realized that there are a few more issues I need to fix and test. Will try to finish it tomorrow or during the weekend and ping you. |
10448fb
to
e570e89
Compare
…nt-logicalplan-subquery-handling # Conflicts: # datafusion/optimizer/src/analyzer/mod.rs
…n-subquery-handling
e570e89
to
12d4a8c
Compare
/// `LogicalPlan`s, for example such as are in [`Expr::Exists`]. | ||
/// | ||
/// [`Expr::Exists`]: crate::expr::Expr::Exists | ||
pub(crate) fn rewrite_children<F>(&mut self, mut f: F) -> Result<Transformed<()>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peter-toth if you have a moment, I would love to hear any thoughts you have on this API (it is an in-place update to LogicalPlan
but no change to TreeNode
Questions:
- Do you think we need a similar one for
rewrite_children_with_subqueries
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think we need anything else. Your change to map_children()
will affect all LogicalPlan::..._with_subqueries()
APIs too. The trick in rewrite_arc()
looks very good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, just one thing came into my mind: as far as I see in rewrite_arc()
you need an owned Arc<LogicalPlan>
to call Arc::try_unwrap
. But, you only have &mut Arc<LogicalPlan>
and that's why you need std::mem::swap
2 times with that PLACEHOLDER
.
If your rewrite_children()
/map_children()
worked with owned Arc<LogicalPlan>
and not with &mut Arc<LogicalPlan>
the swap wouldn't be needed.
But in that case the implementation would be a bit complex like:
fn map_children<F: FnMut(Self) -> Result<Transformed<Self>>>(
self,
f: F,
) -> Result<Transformed<Self>> {
Ok(match self {
LogicalPlan::Projection(Projection { expr, input, schema }) => {
rewrite_arc(input, f)?.update_data(|input| LogicalPlan::Projection(Projection { expr, input, schema }))
}
LogicalPlan::Filter(Filter { predicate, input }) => {
rewrite_arc(input, f)?.update_data(|input| LogicalPlan::Filter(Filter { predicate, input }))
}
...
})
}
Also discard_data()
won't be required. BTW, this is how Expr::map_children()
is implemented but there are Box
es so the transform_box()
implementation is simpler than this rewrite_arc()
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, I'm not sure about the cost of those 2 swaps, so it might not give any noticable improvement...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent idea. I will do it
LogicalPlan::map_children
LogicalPlan::map_children
LogicalPlan::map_children
LogicalPlan::clone()
in LogicalPlan::map_children
LogicalPlan::clone()
in LogicalPlan::map_children
LogicalPlan::clone()
in LogicalPlan::map_children
when possible
This PR is getting quite messy with history. I will make a new one Update: #9999 |
LogicalPlan::clone()
in LogicalPlan::map_children
when possible
Waiting on sorting out subquery handling in #9913
Which issue does this PR close?
Part of #9637 (based on ideas from #9708 and #9768). 🙏 @jayzhan211
Rationale for this change
I am trying to make planning faster by not copying as much (I also want to reduce the number of allocations our planning does as we think it may be related to a concurrency bottleneck we are seeing downstream in IOx)
What changes are included in this PR?
Change
LogicalPlan::map_children
to rewrite the children in place without copying them. This uses a trick(hack?) to rewriteArc<LogicalPlan>
in place when possibleThis Implement suggestion by @peter-toth #9780 (comment) on #9780 and make the existing tree node API faster
Are these changes tested?
Functionally, this is covered by existing CI.
Performance tests: (slightly) faster than from main (but sets the stage for #9948 which goes much faster)
Details
Are there any user-facing changes?
TLDR is No.
This change alone doesn't change performance (largely because the TreeNodeRewriter isn't used in the optimizer passes yet). However, when combined with #9948 it makes planning 10% faster (and sets the stage for even more improvements)