-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Complete Possible Join Type Handling for EmptyRelation Propagation Rule #10967
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Can i help with this one? |
Sounds good to me -- once #10963 is merged, I'll mark off the ones it updates, then you can take the rest? I took the easy ones on the first go 😅 |
Sounds good to me! |
I just saw @tshauck's PR was approved by 2 reviewers, so i think i can assume it will be merged in the future. I was looking at the remaining cases and wanted to brainstorm in case anyone wanted to give feedback on my thoughts before I try to implement them.
Case 3I think this one is trivial. Like tshauck did, just return an empty relation. Case 1 and 2Anti Joins result in rows which don't have any matching join column values in the other table. So in the case when there are Case 4 and 5I understand why this should be done this way and agree with it. But I don't know how to do this in the Logical Optimizer.
And Table B, which is empty
When we do
How can I implement this nicely? Thanks in advance to anyone who responds! |
For cases 4 and 5, this is kinda what you're getting at already, but there's also |
Hmm yeah, but it does sound better. Guess I'll have to try and test them out, maybe I'll stumble onto something else. Thanks for helping! |
You'll probably see otherwise, but #10963 merged. Please LMK if I can help on any of these. |
Yeah, I need some help with Cases 4 and 5. Creating a projection of the left or right child and extending its schema from the other table doesn't work. I first tried my suggestion, but that failed very quickly; then I tried yours, which got me closer. // For LeftOut/Full Join, if the right side is empty, the Join can be eliminated with a Projection with left side
// columns + right side columns replaced with null values.
JoinType::Full | JoinType::Left if right_empty => {
dbg!("here");
Ok(Transformed::yes(LogicalPlan::Projection(
Projection::try_new_with_schema(
join.schema
.columns()
.into_iter()
.map(|col| {
// transform columns from other schema into NULLs
if join.right.schema().is_column_from_schema(&col)
{
Expr::Literal(ScalarValue::Null)
} else {
Expr::Column(col)
}
})
.collect(),
join.left.clone(),
join.schema.clone(),
)?,
)))
} But when I test this out I get a SchemaError because they are different
I don't really know how to fix this. I can create a Draft PR if you want, so you can look at my code instead of this comment. Other waysI think that we have to ignore these cases, and if they do cause problems on larger tables, then we can optimize the join impl to handle empty join tables. I'm not sure though, I don't have much experience with this so I'm just looking at other possible solutions Let me know what you think! |
Interesting... to your later point, you might consider opening a DRAFT PR. And if it's not too much, maybe do an initial one for case 3 then stack the draft of 4 and 5 on it? Case 3 is probably easy to get reviewed and merged, then we can chat about 4 and 5 potentially with additional eyes. To the actual problem, the difference is obviously in the field qualifiers, so maybe there's a way to keep the right field qualifier from the column and/or update schema after it's done? |
Sorry, what do you mean by stack the draft for case 4 and 5 on top of the pr for 3? |
Like make a branch for case 3, open a PR for that, then branch off that one for the changes for cases 4 and 5. You could then rebase if case 3's PR got merged or if it makes more sense, you can merge the 4 and 5 branch into the case 3, then merge that into main... perhaps unnecessarily complicated, vs just two independent branches. Just thinking how to get the easy case merged while working through the more complex case. |
Alright, I've never done the rebase thing, it does sound nicer than 2 separate branches, I guess I can try it out now. You'll see the PR for cases 1, 2 and 3 tomorrow. |
So, I have been trying some different ways to make the last 2 cases work, but I can't find a simple, fast way. The problem lies in the fact that we can't project the left/right input with columns from the other input as null with the
But I do have to say, this is a lot of work for this specific use case (I don't know any other atm). If anyone has any tips or extra bits of knowledge, please share them, Thanks! Update: |
Is your feature request related to a problem or challenge?
Currently on
main
only theInner
join type is considered when doing EmptyRelation propagation, but it's possible to infer if an EmptyRelation is propagatable for some other join types for some other empty conditions.Describe the solution you'd like
Fill out the join types per the TODO here, add or remove any that do or do not make sense:
datafusion/datafusion/optimizer/src/propagate_empty_relation.rs
Lines 98 to 109 in a8847e1
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: