Skip to content

Avoid copying (so much) for LogicalPlan::map_children #9946

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 11 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 4, 2024

Waiting on sorting out subquery handling in #9913

Which issue does this PR close?

Part of #9637 (based on ideas from #9708 and #9768). 🙏 @jayzhan211

Rationale for this change

I am trying to make planning faster by not copying as much (I also want to reduce the number of allocations our planning does as we think it may be related to a concurrency bottleneck we are seeing downstream in IOx)

What changes are included in this PR?

Change LogicalPlan::map_children to rewrite the children in place without copying them. This uses a trick(hack?) to rewrite Arc<LogicalPlan> in place when possible

This Implement suggestion by @peter-toth #9780 (comment) on #9780 and make the existing tree node API faster

Are these changes tested?

Functionally, this is covered by existing CI.

Performance tests: (slightly) faster than from main (but sets the stage for #9948 which goes much faster)

Details

group                                         main                                   optimizer_tree_node2
-----                                         ----                                   --------------------
logical_aggregate_with_join                   1.00  1220.2±14.00µs        ? ?/sec    1.00  1224.6±19.28µs        ? ?/sec
logical_plan_tpcds_all                        1.00    160.2±1.27ms        ? ?/sec    1.01    161.1±1.49ms        ? ?/sec
logical_plan_tpch_all                         1.03     17.5±0.16ms        ? ?/sec    1.00     17.0±0.25ms        ? ?/sec
logical_select_all_from_1000                  1.02     19.7±0.11ms        ? ?/sec    1.00     19.3±0.17ms        ? ?/sec
logical_select_one_from_700                   1.00   798.7±35.30µs        ? ?/sec    1.00   800.0±12.76µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   742.0±25.24µs        ? ?/sec    1.01   747.3±16.51µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   728.0±11.02µs        ? ?/sec    1.00   730.2±10.20µs        ? ?/sec
physical_plan_tpcds_all                       1.01  1899.1±14.88ms        ? ?/sec    1.00  1878.6±13.10ms        ? ?/sec
physical_plan_tpch_all                        1.04    124.3±1.76ms        ? ?/sec    1.00    118.9±1.24ms        ? ?/sec
physical_plan_tpch_q1                         1.03      7.5±0.07ms        ? ?/sec    1.00      7.3±0.05ms        ? ?/sec
physical_plan_tpch_q10                        1.00      5.7±0.05ms        ? ?/sec    1.00      5.6±0.07ms        ? ?/sec
physical_plan_tpch_q11                        1.01      5.0±0.08ms        ? ?/sec    1.00      4.9±0.06ms        ? ?/sec
physical_plan_tpch_q12                        1.01      4.0±0.03ms        ? ?/sec    1.00      3.9±0.07ms        ? ?/sec
physical_plan_tpch_q13                        1.02      2.7±0.03ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
physical_plan_tpch_q14                        1.02      3.4±0.04ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
physical_plan_tpch_q16                        1.01      5.0±0.05ms        ? ?/sec    1.00      4.9±0.04ms        ? ?/sec
physical_plan_tpch_q17                        1.01      4.7±0.05ms        ? ?/sec    1.00      4.7±0.04ms        ? ?/sec
physical_plan_tpch_q18                        1.01      5.1±0.08ms        ? ?/sec    1.00      5.1±0.10ms        ? ?/sec
physical_plan_tpch_q19                        1.01      9.7±0.07ms        ? ?/sec    1.00      9.5±0.08ms        ? ?/sec
physical_plan_tpch_q2                         1.01     10.7±0.09ms        ? ?/sec    1.00     10.6±0.10ms        ? ?/sec
physical_plan_tpch_q20                        1.03      6.2±0.04ms        ? ?/sec    1.00      6.1±0.06ms        ? ?/sec
physical_plan_tpch_q21                        1.02      8.5±0.11ms        ? ?/sec    1.00      8.4±0.11ms        ? ?/sec
physical_plan_tpch_q22                        1.03      4.6±0.07ms        ? ?/sec    1.00      4.4±0.04ms        ? ?/sec
physical_plan_tpch_q3                         1.00      4.0±0.04ms        ? ?/sec    1.00      3.9±0.08ms        ? ?/sec
physical_plan_tpch_q4                         1.03      3.0±0.04ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_tpch_q5                         1.01      5.8±0.04ms        ? ?/sec    1.00      5.7±0.06ms        ? ?/sec
physical_plan_tpch_q6                         1.03      2.1±0.02ms        ? ?/sec    1.00      2.0±0.03ms        ? ?/sec
physical_plan_tpch_q7                         1.00      7.7±0.05ms        ? ?/sec    1.02      7.8±0.10ms        ? ?/sec
physical_plan_tpch_q8                         1.00      9.8±0.13ms        ? ?/sec    1.01      9.9±0.15ms        ? ?/sec
physical_plan_tpch_q9                         1.00      7.4±0.07ms        ? ?/sec    1.00      7.4±0.07ms        ? ?/sec
physical_select_all_from_1000                 1.01    129.6±0.45ms        ? ?/sec    1.00    127.7±0.44ms        ? ?/sec
physical_select_one_from_700                  1.00      4.1±0.03ms        ? ?/sec    1.01      4.1±0.04ms        ? ?/sec

Are there any user-facing changes?

TLDR is No.

This change alone doesn't change performance (largely because the TreeNodeRewriter isn't used in the optimizer passes yet). However, when combined with #9948 it makes planning 10% faster (and sets the stage for even more improvements)

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Apr 4, 2024
@alamb alamb changed the title Optimzation: avoid copying (so much) for LogicalPlan::map_children Optimzation: avoid copying (so much) for LogicalPlan::map_children Apr 4, 2024
@alamb alamb force-pushed the alamb/map_in_place branch from dcdbe88 to 755342f Compare April 4, 2024 13:18
@alamb alamb changed the title Optimzation: avoid copying (so much) for LogicalPlan::map_children Avoid copying (so much) for LogicalPlan::map_children Apr 4, 2024
@alamb alamb force-pushed the alamb/map_in_place branch from 755342f to b4a9ffd Compare April 4, 2024 19:22
@alamb
Copy link
Contributor Author

alamb commented Apr 4, 2024

Ok, I think once #9913 from @peter-toth is merged this PR will be ready to review

// specific language governing permissions and limitations
// under the License.

//! Methods for rewriting logical plans
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peter-toth if you have time I would love to hear your thoughts on this change / API (no changes to TreeNode)

@peter-toth
Copy link
Contributor

Ok, I think once #9913 from @peter-toth is merged this PR will be ready to review

Sorry @alamb , I'm still working on my #9913. I realized that there are a few more issues I need to fix and test. Will try to finish it tomorrow or during the weekend and ping you.

@alamb
Copy link
Contributor Author

alamb commented Apr 4, 2024

Sorry @alamb , I'm still working on my #9913. I realized that there are a few more issues I need to fix and test. Will try to finish it tomorrow or during the weekend and ping you.

No worries! I am not blocked and have plenty of other things to entertain me at the moment

@alamb alamb force-pushed the alamb/map_in_place branch from 10448fb to e570e89 Compare April 6, 2024 10:37
@alamb alamb force-pushed the alamb/map_in_place branch from e570e89 to 12d4a8c Compare April 7, 2024 17:49
@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Apr 7, 2024
/// `LogicalPlan`s, for example such as are in [`Expr::Exists`].
///
/// [`Expr::Exists`]: crate::expr::Expr::Exists
pub(crate) fn rewrite_children<F>(&mut self, mut f: F) -> Result<Transformed<()>>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peter-toth if you have a moment, I would love to hear any thoughts you have on this API (it is an in-place update to LogicalPlan but no change to TreeNode

Questions:

  1. Do you think we need a similar one for rewrite_children_with_subqueries 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think we need anything else. Your change to map_children() will affect all LogicalPlan::..._with_subqueries() APIs too. The trick in rewrite_arc() looks very good to me.

Copy link
Contributor

@peter-toth peter-toth Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, just one thing came into my mind: as far as I see in rewrite_arc() you need an owned Arc<LogicalPlan> to call Arc::try_unwrap. But, you only have &mut Arc<LogicalPlan> and that's why you need std::mem::swap 2 times with that PLACEHOLDER.
If your rewrite_children()/map_children() worked with owned Arc<LogicalPlan> and not with &mut Arc<LogicalPlan> the swap wouldn't be needed.
But in that case the implementation would be a bit complex like:

fn map_children<F: FnMut(Self) -> Result<Transformed<Self>>>(
    self,
    f: F,
) -> Result<Transformed<Self>> {
    Ok(match self {
        LogicalPlan::Projection(Projection { expr, input, schema }) => {
            rewrite_arc(input, f)?.update_data(|input| LogicalPlan::Projection(Projection { expr, input, schema }))
        }
        LogicalPlan::Filter(Filter { predicate, input }) => {
            rewrite_arc(input, f)?.update_data(|input| LogicalPlan::Filter(Filter { predicate, input }))
        }
        ...
    })
}

Also discard_data() won't be required. BTW, this is how Expr::map_children() is implemented but there are Boxes so the transform_box() implementation is simpler than this rewrite_arc() is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, I'm not sure about the cost of those 2 swaps, so it might not give any noticable improvement...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent idea. I will do it

@alamb alamb changed the title Avoid copying (so much) for LogicalPlan::map_children Avoid copying (as much) for LogicalPlan::map_children Apr 8, 2024
@alamb alamb changed the title Avoid copying (as much) for LogicalPlan::map_children Refactor: Avoid LogicalPlan::clone() in LogicalPlan::map_children Apr 8, 2024
@alamb alamb changed the title Refactor: Avoid LogicalPlan::clone() in LogicalPlan::map_children Refactor: Avoid LogicalPlan::clone() in LogicalPlan::map_children when possible Apr 8, 2024
@alamb
Copy link
Contributor Author

alamb commented Apr 8, 2024

This PR is getting quite messy with history. I will make a new one

Update: #9999

@alamb alamb closed this Apr 8, 2024
@alamb alamb changed the title Refactor: Avoid LogicalPlan::clone() in LogicalPlan::map_children when possible Avoid copying (so much) for LogicalPlan::map_children Apr 8, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants