Skip to content

Feat!: Add support for concurrent table diff of multiple models #4256

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 26 commits into from
May 6, 2025

Conversation

themisvaltinos
Copy link
Contributor

@themisvaltinos themisvaltinos commented Apr 25, 2025

This update extends using the table_diff command with --select-model to diff a subset or selection of SQLMesh models, fixes: #4198

diff.mp4

For example with the downstream + to diff items and all models downstream of it:
sqlmesh table_diff prod:dev sushi.items+

With the * wildcard to diff all models of a schema:
sqlmesh table_diff prod:dev 'sushi.*'

Similarly with all the selectors: https://sqlmesh.readthedocs.io/en/latest/guides/model_selection/

  • If the --show-sample flag is included, the output also includes sample rows.
  • If the engine supports it, it runs all the table diff concurrently.

@themisvaltinos themisvaltinos force-pushed the themis/diff branch 2 times, most recently from f9124c6 to febb8fb Compare April 29, 2025 20:11
@themisvaltinos
Copy link
Contributor Author

Thanks for the review @izeigerman addressed comments, included messages indicating which models will be diffed and which will not and added a progress bar as well as information about which models are currently being processed for table differences, before providing the full table diffs at the end. I added a video in the description of this pr of how this look

@themisvaltinos themisvaltinos changed the title Feat: Add support for concurrent table diff across all impacted models Feat: Add support for concurrent table diff of multiple models May 1, 2025
@sungchun12
Copy link
Contributor

Can you show specific file path to models that don't have grain specified instead of this generic error?
image

and can you prevent data diffs from running at all until all grains in models to diff are configured?

@themisvaltinos
Copy link
Contributor Author

themisvaltinos commented May 2, 2025

Can you show specific file path to models that don't have grain specified instead of this generic error?
and can you prevent data diffs from running at all until all grains in models to diff are configured?

yes really good points @sungchun12 , changed to provide the file path and model names that don't have grain and also to prevent the diffs from running in that case. so it currently looks like this:

Screenshot 2025-05-02 at 12 02 26

@sungchun12
Copy link
Contributor

sungchun12 commented May 2, 2025

Before you merge, I want to review the UX and hammer away at likely usage patterns. Great update so far! After, I'll approve.

Copy link
Contributor

@sungchun12 sungchun12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to the table diff guide? Something like, "Diffing multiple models at the same time"

https://sqlmesh.readthedocs.io/en/stable/guides/tablediff/?h=table+diff#diffing-tables-or-views

Found a bug. It currently diffs models that are indirect non-breaking. These should be skipped.

Directly Modified: demo__dev.incremental_model (Non-breaking)
└── Indirectly Modified Children:
    ├── demo__dev.full_model (Indirect Non-breaking)
    └── demo__dev.full_model_example (Indirect Non-breaking)


Models to compare:
├── demo.customers
├── demo.full_model
├── demo.full_model_example
├── demo.incremental_model
└── demo.stg_customers

Noticed a UX bug when table_diff errors out. It makes the cursor invisible in my terminal. I had to run this to see it again.

echo -e '\033[?25h'

bad selector bug. This should NOT run any diffs in this scenario.

(.venv) ➜  sqlmesh-demos git:(sung/vscode) ✗ sqlmesh table_diff prod:dev -m 'git:sungvscode'
fatal: ambiguous argument 'sungvscode...': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Models to compare:
├── demo.stg_customers
├── demo.incremental_model
└── demo.customers


Aborted!
Calculating model differences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0% • pending • 0:00:09
    demo.stg_customers ...                                                                     
demo.incremental_model ...                                                                     
        demo.customers ... 

How exactly does this selector flag work? Looks like only models with direct modifications and not the downstream impacts.

(.venv) ➜  sqlmesh-demos git:(sung/vscode) ✗ sqlmesh table_diff prod:dev -m 'git:sung/vscode'

Models to compare:
├── demo.stg_customers
├── demo.customers
└── demo.incremental_model

When I run a selector that fails, it shows nothing. Can you put a message that says something like, "No models match this model selector"

(.venv) ➜  sqlmesh-demos git:(sung/vscode) ✗ sqlmesh table_diff prod:dev -m 'increm*' 

Also I found another bug, but this may be unrelated to this PR.

Calculating model differences ━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.0% • 1/5 • 0:00:11
        demo.full_model .                                                                   
demo.full_model_example .                                                                   
 demo.incremental_model .                                                                   
     demo.stg_customers .                                                                   Error: 10 validation errors for RowDiff
stats.s_count
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/float_type
stats.t_count
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/float_type
stats.join_count
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/float_type
stats.null_grain_count
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/float_type
stats.full_match_count
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/float_type
stats.item_id_matches
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/float_type
stats.num_orders_matches
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/float_type
stats.updated_at_matches
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/float_type
stats.s_only_count
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/float_type
stats.t_only_count
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/float_type

@themisvaltinos
Copy link
Contributor Author

Thanks for the review @sungchun12 if you want to have another look, addressed comments and updated the docs

Can you add to the table diff guide? Something like, "Diffing multiple models at the same time"

yes added in the table diff guide details as well as example use cases of the selectors

Found a bug. It currently diffs models that are indirect non-breaking. These should be skipped.

addressed it by comparing the data hash instead so that indirect non breaking models are skipped

Noticed a UX bug when table_diff errors out. It makes the cursor invisible in my terminal. I had to run this to see it again.

good catch! added a try catch handle that in this cases properly closes the Live rich console so that the terminal settings are restored such as the cursor

bad selector bug. This should NOT run any diffs in this scenario.

(.venv) ➜  sqlmesh-demos git:(sung/vscode) ✗ sqlmesh table_diff prod:dev -m 'git:sungvscode'
fatal: ambiguous argument 'sungvscode...': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Models to compare:

we previously didn't handle the error that was caused by running the git cli command. revised so that we capture the error and display it in a more clean way to the user and raise an appropriate error and not diff

How exactly does this selector flag work? Looks like only models with direct modifications and not the downstream impacts.

(.venv) ➜  sqlmesh-demos git:(sung/vscode) ✗ sqlmesh table_diff prod:dev -m 'git:sung/vscode'

Models to compare:
├── demo.stg_customers
├── demo.customers
└── demo.incremental_model

yes in the case of the git selector to get downstream of upstream dependencies of these models it should be used with the + selector e.g. +git:sung/vscode+ . I see we didn't have details in the docs for these so added in the docs more details about how to combine them.

When I run a selector that fails, it shows nothing. Can you put a message that says something like, "No models match this model selector"

(.venv) ➜  sqlmesh-demos git:(sung/vscode) ✗ sqlmesh table_diff prod:dev -m 'increm*' 

yes added a relevant message that also displays the selectors used

Also I found another bug, but this may be unrelated to this PR.

Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]

yes I believe this is related to this issue: #4310 that Erin fixed a couple of hours ago

@themisvaltinos themisvaltinos requested a review from sungchun12 May 5, 2025 18:20
Copy link
Contributor

@sungchun12 sungchun12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple suggestions on docs and one more question that may be a bug. If not a bug, I'll approve!

@sungchun12
Copy link
Contributor

Thanks for removing the models without changes section!

Can you make sure to include that same warning message when no models match criteria?

No models matched the selection criteria: '1j3hia'

Looks like for git selectors, Models without changes: is still showing up when it shouldn't.

The bug examples below.

(.venv) ➜  sqlmesh-demos git:(sung/vscode) ✗ sqlmesh table_diff dev:dev_sung -m '*'
(.venv) ➜  sqlmesh-demos git:(sung/vscode) ✗ sqlmesh table_diff dev:dev_sung -m 'git:sung/vscode'

Models without changes:
├── "sqlmesh-public-demo"."demo"."stg_customers"
├── "sqlmesh-public-demo"."demo"."customers"
├── "sqlmesh-public-demo"."demo"."incremental_model"
└── "sqlmesh-public-demo"."demo"."incremental_model_new"

(.venv) ➜  sqlmesh-demos git:(sung/vscode) ✗ sqlmesh table_diff prod:dev_sung -m '1j3hia'
No models matched the selection criteria: '1j3hia'

Models without changes:

@themisvaltinos themisvaltinos merged commit 99cca24 into main May 6, 2025
23 checks passed
@themisvaltinos themisvaltinos deleted the themis/diff branch May 6, 2025 16:27
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

table_diff to run multiple data diffs at the same time based on the differences between environments.
5 participants