Skip to content

Unnest UNION plans (in addition to UNION ALL) #7786

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
alamb opened this issue Oct 10, 2023 · 3 comments · Fixed by #7788
Closed

Unnest UNION plans (in addition to UNION ALL) #7786

alamb opened this issue Oct 10, 2023 · 3 comments · Fixed by #7788
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Oct 10, 2023

Is your feature request related to a problem or challenge?

The current the plan for a nested UNION is still nested resulting in unecessary GROUPing

❯ explain SELECT 1 UNION (SELECT 1 UNION SELECT 1);
+---------------+-----------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                    |
+---------------+-----------------------------------------------------------------------------------------+
| logical_plan  | Aggregate: groupBy=[[Int64(1)]], aggr=[[]]                                              |
|               |   Union                                                                                 |
|               |     Projection: Int64(1)                                                                |
|               |       EmptyRelation                                                                     |
|               |     Aggregate: groupBy=[[Int64(1)]], aggr=[[]]                                          |
|               |       Union                                                                             |
|               |         Projection: Int64(1)                                                            |
|               |           EmptyRelation                                                                 |
|               |         Projection: Int64(1)                                                            |
|               |           EmptyRelation                                                                 |

Describe the solution you'd like

As suggested by @jackwener: #7695 (comment), ideally the plan should look like the following (the unions should be unnested and run through a single group by):

❯ explain SELECT 1 UNION (SELECT 1 UNION SELECT 1);
+---------------+-----------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                    |
+---------------+-----------------------------------------------------------------------------------------+
| logical_plan  | Aggregate: groupBy=[[Int64(1)]], aggr=[[]]                                              |
|               |   Union                                                                                 |
|               |     Projection: Int64(1)                                                                |
|               |       EmptyRelation                                                                     |
|               |     Projection: Int64(1)                                                                |
|               |       EmptyRelation                                                                     |
|               |     Projection: Int64(1)                                                                |
|               |       EmptyRelation                                                                     |

Describe alternatives you've considered

No response

Additional context

@maruschin added a nice optimization (as well as very nice tests) to unnest UNION ALL in #7695

❯ explain SELECT 1 UNION ALL (SELECT 1 UNION ALL SELECT 1);
+---------------+----------------------------------------+
| plan_type     | plan                                   |
+---------------+----------------------------------------+
| logical_plan  | Union                                  |
|               |   Projection: Int64(1) AS Int64(1)     |
|               |     EmptyRelation                      |
|               |   Projection: Int64(1) AS Int64(1)     |
|               |     EmptyRelation                      |
|               |   Projection: Int64(1) AS Int64(1)     |
|               |     EmptyRelation                      |
@maruschin
Copy link
Contributor

maruschin commented Oct 10, 2023

I added draft PR. There's a lot of code duplication. I’m thinking about how best to do it: combine this rule with the previous one or separate the common code into a function.

@alamb
Copy link
Contributor Author

alamb commented Oct 11, 2023

combine this rule with the previous one i

This seems natural to me if possible

@maruschin
Copy link
Contributor

maruschin commented Oct 11, 2023

Done, refactored nested union optimization, added tests.
Generalized nested distinct optimization: all distinct operations removed from distinct union, even select distinct.
I think that this case should not be done separately.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants