-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[SPARK-34081][SQL] Only pushdown LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join #31145
Conversation
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #133957 has finished for PR 31145 at commit
|
This also changes the plan of q14a, q14b, does it cause perf regression? I agree that it's unsure if pushing down left join through aggregate is beneficial or not, as both of them can reduce data volume. I have a simple heuristic: we look at the size metrics and see if the left join can be planned as broadcast join. If it can, then it's very likely that pushing it down is beneficial. |
if aggs.forall(_.deterministic) && groups.nonEmpty && | ||
!aggs.exists(ScalarSubquery.hasCorrelatedScalarSubquery) && | ||
!(cond.nonEmpty && groups.equals(aggs) && | ||
cond.forall(e => splitConjunctivePredicates(e).forall(_.isInstanceOf[EqualNullSafe]))) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a new method to JoinSelectionHelper
def canPlanAsBroadcastHashJoin(join: Join, conf: SQLConf): Boolean = {
getBroadcastBuildSide(join.left, join.right, join.joinType,
join.hint, hintOnly = true, conf).isDefined ||
getBroadcastBuildSide(join.left, join.right, join.joinType,
join.hint, hintOnly = false, conf).isDefined
}
and then use it here:
if ... && canPlanAsBroadcastHashJoin(join, conf)
No. |
@@ -334,7 +334,7 @@ Results [3]: [brand_id#13, class_id#14, category_id#15] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to change this file
@@ -319,7 +319,7 @@ Results [3]: [brand_id#13, class_id#14, category_id#15] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to change this file.
@@ -174,7 +174,7 @@ Results [3]: [c_last_name#17, c_first_name#16, d_date#14] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -174,7 +174,7 @@ Results [3]: [c_last_name#17, c_first_name#16, d_date#14] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -319,7 +319,7 @@ Results [3]: [brand_id#13, class_id#14, category_id#15] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Since it changes the final plan of 4 TPCDS queries, can we put the benchmark result for all of these 4 queries even though some of them have no perf change? |
Yes. I have put the benchmark results to pr description. |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #134015 has finished for PR 31145 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
Should not pushdown LeftSemi/LeftAnti over Aggregate for some cases.
Before this pr:
After this pr:
Why are the changes needed?
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test and benchmark test.
Before this pr:

After this pr:
