Skip to content

Move filtered SMJ Full filtered join out of join_partial phase #13369

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Nov 12, 2024

Which issue does this PR close?

Closes #12359
Closes #10659

Rationale for this change

Move the Full Outer filtered SMJ join out of join_partial phase to evaluate filter expressions properly and keep track of previous related expressions already evaluated for the same row

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 12, 2024
async fn test_full_join_1k_filtered() {
JoinFuzzTestCase::new(
make_staggered_batches(1000),
make_staggered_batches(1000),
JoinType::Full,
Some(Box::new(col_lt_col_filter)),
)
.run_test(&[JoinTestType::NljHj], false)
.run_test(&[NljHj, HjSmj], false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works now, adding HJ vs SMJ test back

#Alice 100 Alice 2
#Alice 50 NULL NULL
#Bob 1 NULL NULL
query TITI rowsort
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works now

let mut first_row_idx = 0;
let mut seen_false = false;

for i in 0..row_indices_length {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mask processing is more complex compared to other join types, I'm planning to add more tests and documentation preferably in follow up PR, but if its a blocker for the review I'll update this PR

@comphead comphead requested review from alamb and korowa November 12, 2024 01:20
@comphead
Copy link
Contributor Author

During this work there are some opportunities found to clean up/document/improve testing for existing SMJ code. Planning to file a separate PR for it

@comphead
Copy link
Contributor Author

@andygrove cc

@@ -852,6 +852,54 @@ fn get_corrected_filter_mask(
corrected_mask.extend(vec![Some(true); null_matched]);
Some(corrected_mask.finish())
}
JoinType::Full => {
let mut mask: Vec<Option<bool>> = vec![Some(true); row_indices_length];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use booleanbuilder?

Copy link
Contributor Author

@comphead comphead Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlike to other join types for this one its needed to update current array, the builder does append only afaik

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Dandandan
Copy link
Contributor

Nice work @comphead

@alamb
Copy link
Contributor

alamb commented Nov 13, 2024

Awesome -- thank you @comphead -- the effort you are making to get Sort merge join into shape is very cool.

Thanks also to @Dandandan for the review

@alamb alamb merged commit fd092e0 into apache:main Nov 13, 2024
26 checks passed
alamb pushed a commit to alamb/datafusion that referenced this pull request Nov 13, 2024
…che#13369)

* Move filtered SMJ Full filtered join out of `join_partial` phase

* Move filtered SMJ Full filtered join out of `join_partial` phase

* Move filtered SMJ Full filtered join out of `join_partial` phase
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
core Core DataFusion crate physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
3 participants