Skip to content
This repository was archived by the owner on Jan 7, 2025. It is now read-only.

feat: support left-outer and left-mark hash join impl rules #274

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

yliang412
Copy link
Member

@yliang412 yliang412 commented Dec 22, 2024

Problem

We should be able to convert left-outer join and left-mark logical equi join to hash join.

Summary of changes

  • Add implementation rules to handle these cases.
  • Add join-split-filter rules that extract predicates in the join condition into those that can be pushed down as filters.

misc

  • refactor simplify_log_expr to stop using unreachable!

Not ideal, wants to unite inner, left-outer, and left-mark into one rule

Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
…achable!`

Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
@yliang412
Copy link
Member Author

tpch Q13 needs a rule to push split a filter from the join node, and then the join could be turned into a left-outer hash join. Working on this now.

...
  ├── cond:And
  │   ├── Eq
  │   │   ├── #0
  │   │   └── #9
  │   └── Like { expr: #16, pattern: "%special%requests%", negated: true, case_insensitive: false }

yliang412 and others added 5 commits January 6, 2025 12:37
Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
We eliminate another nested loop join in TPC-H Q13

Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
├── PhysicalScan { table: customer }
└── PhysicalScan { table: orders }
└── PhysicalFilter { cond: Like { expr: #8, pattern: "%special%requests%", negated: true, case_insensitive: false } }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushing down the filter and turn into hash join.

├── cond:Eq
│ ├── #1
│ └── #14
└── PhysicalHashJoin { join_type: LeftMark, left_keys: [ #1 ], right_keys: [ #0 ] }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

picked up by the new hash-join-left-mark rule

│ └── PhysicalScan { table: part }
└── PhysicalScan { table: lineitem }
└── PhysicalProjection { exprs: [ #0, #2 ] }
└── PhysicalHashJoin { join_type: LeftOuter, left_keys: [ #0 ], right_keys: [ #0 ] }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left-outer hash join

│ └── Eq
│ ├── #0
│ └── #1
└── PhysicalHashJoin { join_type: LeftOuter, left_keys: [ #0 ], right_keys: [ #0 ] }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left-outer hash join

├── cond:Eq
│ ├── #0
│ └── #11
└── PhysicalHashJoin { join_type: LeftMark, left_keys: [ #0 ], right_keys: [ #0 ] }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left-mark hash join

Comment on lines +114 to +126
└── PhysicalNestedLoopJoin
├── join_type: Inner
├── cond:And
│ ├── Gt
│ │ ├── Cast { cast_to: Float64, child: #2 }
│ │ └── #8
│ ├── Eq
│ │ ├── #0
│ │ └── #6
│ └── Eq
│ ├── #1
│ └── #7
├── PhysicalFilter { cond: #5 }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a little confusing. investigating ...

@yliang412 yliang412 changed the title [WIP] support left-outer and left-mark hash join impl rules feat: support left-outer and left-mark hash join impl rules Jan 6, 2025
@yliang412 yliang412 marked this pull request as ready for review January 6, 2025 18:39
Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
(Join(JoinType::Inner), child_a, child_b)
);

define_rule!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this rule is correct. You cannot move the outer join condition into a filter in some cases.

Consider select * from a left join b on a.x = b.y and b.z = 1. The result is different from select * from a left join b on a.x = b.y where b.z = 1. Assume left table is x=1, right table is y=1,z=2, the correct result is 1, NULL, NULL, versus the rule will produce zero rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, I realized that this is a filter pushdown, then it might be correct; I will do a review later :)

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants