-
Notifications
You must be signed in to change notification settings - Fork 1.5k
refactor: HashJoinStream
state machine
#8538
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
Benchmark results for tpch_mem are
|
cc @alamb @metesynnada PTAL, if you have some time and feel related / being in context of the issue |
I will review it soon, but it looks great. |
I also plan to review this soon. Thank you @korowa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @korowa -- I read this PR carefully. I have a few style suggestions, but I don't think any of them are requred.
As I undertand it, this is a step towards being able to incrementally generate output batches for joins.
I wanted to also say the comments are really nice and make this PR a joy to read 👏
🏆
cc @Dandandan and @liukun4515
I think it would be good to get @metesynnada 's review before merging this as well
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1475e8e
to
b164328
Compare
Yes, the plan is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on implementing the state machine - I really appreciate the effort you put into it! I'm glad to see that we now have a design coupling with SHJ. From what I can see, it doesn't look like there have been any changes in how we handle join, so I think it all looks good. Thanks for your hard work!
@Dandandan If this looks OK, we can merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the refactoring!
Thank you @korowa |
Which issue does this PR close?
Part of #8130.
Rationale for this change
Structuring HashJoinStream processing logic based on @alamb design suggestion from the issue.
What changes are included in this PR?
HashJoinStream::poll_next_impl
splitted into more granular functions which are used as a handlers, modifying stream state as their resultleft_fut
&visited_left_side
moved to HashJoinStream.build_side BuildSide attribute -- the reason for storing build-side related data separate from stream state, is that it allows to avoid redundant cloning of build side contents across all state changes (as all handlers operate on&mut self: HashJoinStream
) and still keeps build-side available for reading/mutating through references.stream_join_utils.rs
toutils.rs
+StreamJoinStateResult
renamed toStatefulStreamResult
-- seems reasonable as it is used for stateful streams and not only for stream-like joins -- any naming or rolling back related suggestions are welcome and appreciated.Are these changes tested?
Covered by existing test cases.
Are there any user-facing changes?
No.