Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Flink fix terminal streaming events #2768

Merged
merged 1 commit into from
Mar 15, 2024
Merged

Flink fix terminal streaming events #2768

merged 1 commit into from
Mar 15, 2024

Conversation

pawel-big-lebowski
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski commented Mar 14, 2024

Problem

Marquez creates new job version for streaming jobs whenever hash of a job version changes. We introduced this assumption as it makes sense most of the time. However, this does not make much sense for terminal events. In other words, a terminal event for streaming job like complete with no input nor output datasets contained, should mean only a job has finished. It shouldn't mean creating a new job version which is current behaviour.

Closes: #2767

Solution

Please describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a database schema migration, please describe the schema modification(s) and whether it's a backwards-incompatible or backwards-compatible change.

Note: All database schema changes require discussion. Please link the issue for context.

One-line summary:

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added api API layer changes docs labels Mar 14, 2024
Copy link

netlify bot commented Mar 14, 2024

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
🔨 Latest commit 5a0434c
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/65f41cf1b30d59000853ca8c

Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 90.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 84.47%. Comparing base (78a191b) to head (5a0434c).

Files Patch % Lines
...main/java/marquez/service/models/LineageEvent.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main    #2768   +/-   ##
=========================================
  Coverage     84.46%   84.47%           
- Complexity     1415     1429   +14     
=========================================
  Files           251      251           
  Lines          6450     6460   +10     
  Branches        292      299    +7     
=========================================
+ Hits           5448     5457    +9     
  Misses          850      850           
- Partials        152      153    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving our lineage support for streaming jobs, @pawel-big-lebowski! The inclusion of a terminal "state" provides a path to make better decision (hopefully) on the current stage and all subsequent stages of a streaming job. I do feel we need to revisit this logic and document our reasoning, but let's first learn from real world scenarios on how the Marquez metadata model can be improved.

@wslulciuc wslulciuc merged commit 44bf397 into main Mar 15, 2024
16 checks passed
@wslulciuc wslulciuc deleted the streaming-fix branch March 15, 2024 15:25
@pawel-big-lebowski
Copy link
Collaborator Author

@wslulciuc having the same feeling about this.

jonathanpmoraes pushed a commit to nubank/NuMarquez that referenced this pull request Feb 6, 2025
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
api API layer changes docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming jobs do not cumulate datasets sent through a run
2 participants