Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Make local dbt data diffs concurrent #776

Merged
merged 27 commits into from
Dec 5, 2023
Merged

Conversation

sungchun12
Copy link
Contributor

@sungchun12 sungchun12 commented Nov 13, 2023

In the midst of working on #770, I realized all local dbt data diffs run sequentially. This makes data-diff run slower than necessary to reach the same results. Race conditions shouldn't be an issue because each query data-diff runs are independent by design. This is also validated by the fact that cloud data diffs have concurrent diffs already built for awhile and have not encountered bugs.

  • Updated the concurrent thread mechanism using python's built-in ThreadPoolExecutor for cloud diffs and local diffs
  • Added concurrent progress trackers similar to how cloud diffs track concurrent diffs
  • Updated functions and debug logs with specificity to troubleshoot interleaved logging
  • Added detailed tracebacks for errors
  • Updated existing test logic to mock threads

Before this PR: Sequential performance is slow
image

After this PR: Consistently get 2-4x performance increase with concurrency enabled
image
image

data-diff PR Demo V1 - Watch Video
Verify Concurrent Data Diffs work with VS Code Extension - Watch Video

@sungchun12 sungchun12 self-assigned this Nov 15, 2023
@sungchun12 sungchun12 marked this pull request as ready for review November 15, 2023 22:36
@sungchun12 sungchun12 requested a review from dlawin November 15, 2023 22:36
@dlawin
Copy link
Contributor

dlawin commented Nov 17, 2023

Resolves #426

@dlawin dlawin linked an issue Nov 17, 2023 that may be closed by this pull request
@sungchun12 sungchun12 merged commit b3d4223 into master Dec 5, 2023
@sungchun12 sungchun12 deleted the feature/concurrent-dbt-diffs branch December 5, 2023 23:03
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run diffs in paralllel
2 participants