-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[SPARK-16958] [SQL] Reuse subqueries within the same query #14548
Conversation
Test build #63389 has finished for PR 14548 at commit
|
@@ -502,15 +508,64 @@ case class OutputFakerExec(output: Seq[Attribute], child: SparkPlan) extends Spa | |||
|
|||
/** | |||
* Physical plan for a subquery. | |||
* | |||
* This is used to generate tree string for SparkScalarSubquery. | |||
*/ | |||
case class SubqueryExec(name: String, child: SparkPlan) extends UnaryExecNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A large part of this class is shared with BroadcastExchangeExec. Should we try to factor out common functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to have some duplicated code here, over abstracted code is actually harder to read.
@davies this looks pretty good. I am very excited about the SparkPlan clean-up! |
@hvanhovell Had posted an picture, check it out. |
Test build #63560 has finished for PR 14548 at commit
|
Test build #63563 has finished for PR 14548 at commit
|
Cool picture! |
LGTM |
Merging it into master, thanks! |
## What changes were proposed in this pull request? this code come from PR: #11190, but this code has never been used, only since PR: #14548, Let's continue fix it. thanks. ## How was this patch tested? N / A Closes #23227 from heary-cao/unuseSparkPlan. Authored-by: caoxuewen <cao.xuewen@zte.com.cn> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@davies @hvanhovell @gatorsmile But in deed, the stage of same subquery execute maybe not once as following: |
@JkSelf can you file a JIRA ticket? |
@hvanhovell , Thanks for your help and I have filed Jira 26639. |
## What changes were proposed in this pull request? this code come from PR: apache#11190, but this code has never been used, only since PR: apache#14548, Let's continue fix it. thanks. ## How was this patch tested? N / A Closes apache#23227 from heary-cao/unuseSparkPlan. Authored-by: caoxuewen <cao.xuewen@zte.com.cn> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
There could be multiple subqueries that generate same results, we could re-use the result instead of running it multiple times.
This PR also cleanup up how we run subqueries.
For SQL query
The explain is
The visualized plan:
How was this patch tested?
Existing tests.