-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Flink fix terminal streaming events #2768
Conversation
✅ Deploy Preview for peppy-sprite-186812 canceled.
|
c28168b
to
19e77fe
Compare
19e77fe
to
0074433
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2768 +/- ##
=========================================
Coverage 84.46% 84.47%
- Complexity 1415 1429 +14
=========================================
Files 251 251
Lines 6450 6460 +10
Branches 292 299 +7
=========================================
+ Hits 5448 5457 +9
Misses 850 850
- Partials 152 153 +1 ☔ View full report in Codecov by Sentry. |
0074433
to
5e63ebb
Compare
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
5e63ebb
to
5a0434c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving our lineage support for streaming jobs, @pawel-big-lebowski! The inclusion of a terminal "state" provides a path to make better decision (hopefully) on the current stage and all subsequent stages of a streaming job. I do feel we need to revisit this logic and document our reasoning, but let's first learn from real world scenarios on how the Marquez metadata model can be improved.
@wslulciuc having the same feeling about this. |
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Problem
Marquez creates new job version for streaming jobs whenever hash of a job version changes. We introduced this assumption as it makes sense most of the time. However, this does not make much sense for terminal events. In other words, a terminal event for streaming job like
complete
with no input nor output datasets contained, should mean only a job has finished. It shouldn't mean creating a new job version which is current behaviour.Closes: #2767
Solution
Please describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a database schema migration, please describe the schema modification(s) and whether it's a backwards-incompatible or backwards-compatible change.
One-line summary:
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)