-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix Delta source initialization issue when using AvailableNow
When using AvailableNow, here are the flows for Delta source: - New query: prepareForTriggerAvailableNow, (latestOffset -> getBatch)*. - Restarted query: prepareForTriggerAvailableNow, getBatch, (latestOffset -> getBatch)*. When restarting a query, getBatch is required to be called first. Otherwise, previousOffset will not be set and latestOffset will assume it's a new query and return an incorrect offset. Today, we call latestOffset inside prepareForTriggerAvailableNow, which causes the incorrect initialization for lastOffsetForTriggerAvailableNow because previousOffset is not set yet at this moment when restarting a query. In this PR, we add isTriggerAvailableNow and set it to true in prepareForTriggerAvailableNow without initializing lastOffsetForTriggerAvailableNow, and make lastOffsetForTriggerAvailableNow initialization happen inside latestOffset (for new query) or getBatch (for restarted query) so that it can be initialized correctly. We add an internal flag spark.databricks.delta.streaming.availableNow.offsetInitializationFix.enabled to allow users switching back to the old behavior. In addition, we also add a validation for previousOffset and currentOffset to make sure they never move backward. This would ensure we will not cause data duplication even if we have any bug in offset generation. spark.databricks.delta.streaming.offsetValidation.enabled is added to allow users turning off the check. GitOrigin-RevId: a2684fbcdecf8f621fd3ce751f3e71fc96a17d7c
- Loading branch information
1 parent
ab0158d
commit 29d3a09
Showing
4 changed files
with
229 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters