-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Determine automatically if push join to table scan #6818
Determine automatically if push join to table scan #6818
Conversation
5513f17
to
92dbbb0
Compare
PlanNodeStatsEstimate joinStats = context.getStatsProvider().getStats(joinNode); | ||
PlanNodeStatsEstimate leftStats = context.getStatsProvider().getStats(joinNode.getLeft()); | ||
PlanNodeStatsEstimate rightStats = context.getStatsProvider().getStats(joinNode.getRight()); | ||
if (joinStats.isOutputRowCountUnknown() || leftStats.isOutputRowCountUnknown() || rightStats.isOutputRowCountUnknown()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if one of left output count or right count is known and larger than join output row count, why not pushdown join in such case as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - we could. Though it is strictly theoretical case. As if we do not know either left or right size. We would not know the join size :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I missed that. Any particular reason for basing this on row count instead of size ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. Probably size would be more appropriate. I will see how painful it is to change that.
i plan to review this once that one is merged |
92dbbb0
to
0e8b453
Compare
/** | ||
* Determine automatically if push join to connector | ||
*/ | ||
AUTOMATIC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is safe to make it the default
PlanNodeStatsEstimate joinStats = context.getStatsProvider().getStats(joinNode); | ||
PlanNodeStatsEstimate leftStats = context.getStatsProvider().getStats(joinNode.getLeft()); | ||
PlanNodeStatsEstimate rightStats = context.getStatsProvider().getStats(joinNode.getRight()); | ||
if (joinStats.isOutputRowCountUnknown() || leftStats.isOutputRowCountUnknown() || rightStats.isOutputRowCountUnknown()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since stats calculation can be costly (eg can involve a trip to metastore), short-circuit calculation as early as you can.
To keep this readable, please extract the condition to a separate method.
@@ -114,6 +115,19 @@ public Result apply(JoinNode joinNode, Captures captures, Context context) | |||
return Result.empty(); | |||
} | |||
|
|||
if (getJoinPushdownMode(context.getSession()) == JoinPushdownMode.AUTOMATIC) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally use a switch, and make it exhaustive, future proofing for the case when we add something like AUTOMATIC_EAGER (which we don't have to add yet, but we may want to add in the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvrm, in this case it doesn't matter -- this is the only place the enum is used, so no way it gets forgotten and not updated
return Result.empty(); | ||
} | ||
|
||
if (joinStats.getOutputRowCount() > leftStats.getOutputRowCount() + rightStats.getOutputRowCount()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is not a rocket science, it'd be nice to add some comment, eg why we're choosing +
over max
.
from my perspective it was some 'random thought from findepi' (and i don't feel strongly), but still let's safe future readers suffering and try to word some explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some reasoning. Not sure if helpful
0e8b453
to
70680d7
Compare
ac |
@@ -135,7 +135,7 @@ | |||
private DataSize filterAndProjectMinOutputPageSize = DataSize.of(500, KILOBYTE); | |||
private int filterAndProjectMinOutputPageRowCount = 256; | |||
private int maxGroupingSets = 2048; | |||
private JoinPushdownMode joinPushdownMode = JoinPushdownMode.DISABLED; | |||
private JoinPushdownMode joinPushdownMode = JoinPushdownMode.AUTOMATIC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see conversation about code level documentation in the other pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment as a separate commit before introducing AUTOMATIC
mode.
@@ -114,6 +117,10 @@ public Result apply(JoinNode joinNode, Captures captures, Context context) | |||
return Result.empty(); | |||
} | |||
|
|||
if (getJoinPushdownMode(context.getSession()) == JoinPushdownMode.AUTOMATIC && !shouldProceedWithPushDown(joinNode, context)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the getJoinPushdownMode
should be consulted inside shouldProceedWithPushDown
(or you'd want to rename the method to indicate it's appropriate for "automatic" mode only)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the method to skipJoinPushdownBasedOnCost
(reversing true/false return value semantics), and moved getJoinPushdownMode(context.getSession()) == JoinPushdownMode.AUTOMATIC
inside.
Add "automatic" mode of join pushdown operation. In that mode join will only be pused down into table scan if statistics are available for join node and both source table scan nodes. And if expected numuber of rows coming out of join is less than total number of rows from both sources.
70680d7
to
2e3ebdf
Compare
@@ -135,16 +135,7 @@ | |||
private DataSize filterAndProjectMinOutputPageSize = DataSize.of(500, KILOBYTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if number of rows after pushdown is smaller then without pushdown it could significantly increase cpu overhead of underlying source (table scans might be much cheaper than join). I think it would be great to determine what's the impact of pushdown on underlying connectors. It could be that join pushdown is beneficial only when joins are very non selective and users don't want cpu of underlying connector to increase significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Yet I would assume that you will still be able to disable pushdown on per-connector level in configuration. As well as per-query using session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally -- #6874 provides both catalog level config and session toggle.
return true; | ||
} | ||
|
||
if (joinOutputSize > leftOutputSize + rightOutputSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding some factor here, e.g pushed down join should produce 2x less rows than in trino. Such factor might need to be empirically established
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you mean to replace left + right
with max(left, right) * 0.5
? Works for me, given that the current formula is not very scientificly determined.
I think we should do "something reasonable" & iterate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I find initial value of a factor 1.0
as good as 0.5
On top of: #6752Review last commit only.