-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Decorrelate scalar subqueries with more complex filter expressions #14554
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
take |
What do you think about implementing the more general approach to subquery unnesting described in that paper? I think @xudong963 mentioned he had done something similar before |
From what is see in current code, this struct For that paper implementation, i'll try my best to find time and figure out what usecases Datafusion cannot yet support. Will need to do it in steps/PRs |
FWIW I think @xudong963 said he has experience implementing such code so perhaps he will be able to help / assist with the implementation and review |
Yes, please ping me @duongcongtoai in your PR |
From this PR, there are several types of query mentioned that need support
I'll start thinking about implementing unnesting for all these usecases |
Hey @duongcongtoai, I want to draw your attention on a follow-up paper on "Unnesting Arbitrary Queries": https://15799.courses.cs.cmu.edu/spring2025/papers/11-unnesting/neumann-btw2025.pdf This paper improves on the original approach by better dealing with multiple nesting levels. It also describes the process in an algorithmic way that might be closer to the implementation |
thank you, i'll take a look at the PR |
I think we can break down this story into multiple step:
|
I really like the idea of the incremental approach -- I think it is practically speaking the only one we are likely to be able to pull off. Thank you @duongcongtoai There are a bunch of related tickets listed on this epic: What do you think about creating a new ticket with the steps you outline above @duongcongtoai ? I am pretty sure others are interested in this feature as well and may be able to help |
I think we can reuse this ticket right?: #5492 |
Also, there is a newer paper for the topic: https://15799.courses.cs.cmu.edu/spring2025/papers/11-unnesting/neumann-btw2025.pdf |
Yes, that paper basically gave pretty neat skeleton for a decorrelation framework |
|
I recommend we continue the discussion on |
Is your feature request related to a problem or challenge?
Datafusion already support decorrelating simple scalar subqueries in this PR: #6457
This follow the first approach in TUM paper (simple unnesting), and allow decorrelating this simple query
However, if we add an
or
condition this subqueryDatafusion cannot decorrelate it
Describe the solution you'd like
Support decorrelating this query following the second method mentioned in the paper
Describe alternatives you've considered
No response
Additional context
General framework for decorrelation maybe discussed here #5492
But the steps needed to make this work is followed
Allow decorrelation for this type of filter exprs in this code:
datafusion/datafusion/optimizer/src/decorrelate.rs
Line 162 in 813220d
Add more logic to handle complex query decorrelation:
t2.t2_id = domain.t1_id OR t2.t2_name = domain.t1_name
)For example the above mentioned query may be rewritten like
Logical plan may look like
The text was updated successfully, but these errors were encountered: