-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Expression simplifier does not simplify A = B AND B = A
#8724
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
I think the idea of a
Using rules 1 and 2 would probably handle the case in this ticket. It would also help with things like Maybe we could add this simplification directly as as a rule to ExprSimplifier |
Also, I recently implemented more general logic to extract "literal guarantees" for |
@Jefffrey if you agree with adding this directly as a rule to the ExprSimplifier, I think it would potentially make a good first project for someone |
Just a note, that this doesn't apply only for
So ideally before the
Yes this sounds good |
I am marking this as a good first issue but it is really a medium sized project However, I think it is well specified and the existing code is straightforward to extend The goal is to add this simplification directly to ExprSimplifier CanonicalizeFirst canonicalize any BinaryExprs so:
Remove reundancy
So for example I would expect the following to be simplified:
|
I took a shot at writing up a description @Jefffrey -- let me know if that makes sense |
The canonicalize part sounds good. The remove redundancy, I think is already taken care of by expr_simplifier? e.g. |
Nice! I didn't know / remember that. |
After reading this issue, I'd like a try. :) |
I submitted #8780 that may help this. |
@alamb Does it make sense to use IntervalArithmetics in this kind of optimizations? It cannot handle strings yet but I don't think it will take much effort. |
That is a good question @berkaysynnada and I am not sure what the best answer is. I think interval analysis has the most promise in filter analysis (aka proving that boolean expressions can not be However, I need to think more about how interval analysis could be used to simplify expressions 🤔 One challenge may be that DataFusion has a split between Logical |
Is your feature request related to a problem or challenge?
Given this case:
I would expect the final plan to simplify the
FilterExec
to only bewakana@0 = ookuma
orookuma = wakana@0
, as theAND
of these conditions is redundant.Describe the solution you'd like
Possible solutions:
left
andright
fields of allBinaryExpr
withop
ofOperator::Eq
to a consistent order, which would be placed high in the rules list (during analysis?) to ensure downstream rules (like simplify_expressions) can properly determine this equality and simplify the conditionBinaryExpr
to have a customPartialEq
/Eq
implementation which disregards order of itsleft
/right
fields when checking equality (only forOperator::eq
)https://github.com/apache/arrow-datafusion/blob/9a6cc889a40e4740bfc859557a9ca9c8d043891e/datafusion/expr/src/expr.rs#L208-L217
Option 1 seems the cleaner solution, since don't have to muck with manual implementation of
PartialEq
/Eq
Describe alternatives you've considered
No response
Additional context
Found this while checking if #4732 can be closed
The text was updated successfully, but these errors were encountered: