[Spark] Skip reading log entries beyond endOffset, if specified while getting file changes for CDC in streaming queries #3110
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which Delta project/connector is this regarding?
Description
Skip reading log entries beyond endOffset, if specified while getting file changes for CDC in streaming queries
How was this patch tested?
Existing unit tests
Also verified using logs to ensure that additional Delta logs are not read
Before:
After:
Difference is even more if we are processing/reading through large number of backlog versions.
In Cx setup, before the change - batches are taking > 300s. After the change, batches complete is < 15s.
Does this PR introduce any user-facing changes?
No