Skip to content

Fix bug in remove_join_expressions #11693

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 4 commits into from
Jul 30, 2024
Merged

Conversation

jonahgao
Copy link
Member

Which issue does this PR close?

Closes #11621.

Rationale for this change

Given an example query:

SELECT * FROM t1 JOIN t2 ON t1.v1 = t2.v1 WHERE (t1.v1 == t2.v1) OR t1.v1

When removing the join predicate t1.v1 = t2.v1 from the parent filter (t1.v1 == t2.v1) OR t1.v1, this process can be considered as replacing the join predicate with true. The result should be TRUE OR t1.v1 and further simplified to empty, but currently, it is t1.v1. This will lead to incorrect join results.

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jul 28, 2024
@@ -1046,6 +1046,19 @@ true false
true NULL
true NULL

# Test issue: https://github.com/apache/datafusion/issues/11621
query BB
SELECT * FROM t1 JOIN t2 ON t1.v1 = t2.v1 WHERE (t1.v1 == t2.v1) OR t1.v1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query returns an empty result on the current main.

@@ -1057,7 +1071,7 @@ mod tests {
"Filter: (t4.c < UInt32(15) OR t4.c = UInt32(688)) AND (t4.c < UInt32(15) OR t3.c = UInt32(688) OR t3.b = t4.b) [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
" Inner Join: t3.a = t4.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
" Inner Join: t1.a = t3.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
" Filter: t2.c < UInt32(15) AND t2.c = UInt32(688) [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
" Filter: t2.c = UInt32(688) [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove t1.a = t2.a from (t1.a = t2.a OR t2.c < 15) AND (t1.a = t2.a AND tc.2 = 688), and the result should be tc.2 = 688

@jonahgao

This comment was marked as resolved.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. Thank you @jonahgao for the fix (and @2010YOUY01 for SQLancer that found it 🙏 )

jonahgao and others added 2 commits July 30, 2024 10:04
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@jonahgao jonahgao merged commit 2f5e73c into apache:main Jul 30, 2024
24 checks passed
@jonahgao
Copy link
Member Author

Thanks for the review @alamb

@jonahgao jonahgao deleted the remove_join_expr branch July 30, 2024 03:06
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect predicate evaluation result in a query (SQLancer-NoREC)
2 participants