Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add new functional_dependency generic test macro #988

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -606,6 +606,38 @@ models:
where: "num_orders > 0"
```

### functional_dependency ([source](macros/generic_tests/functional_dependency.sql))

Asserts that one or more columns (the “determinants”) functionally determine another column (the “dependent”). For each unique combination of the determinant columns, there should be exactly one distinct value in the dependent column. If any combination of determinant columns maps to multiple dependent values, the test fails.

Provide [a `where` argument](https://docs.getdbt.com/reference/resource-configs/where) to filter to specific records only (useful for partial checks).

**Usage:**

```yaml
version: 2

models:
- name: model_name
columns:
- name: col_a
- name: col_b
- name: col_y
tests:
- dbt_utils.functional_dependency:
determinants:
- col_a
- col_b
dependent: col_y
# Optional filtering
config:
where: "active = true"
```

In this example, `(col_a, col_b)` together determine `col_y`. If any `(col_a, col_b)` pair is associated with more than one distinct `col_y`, the test fails. If you only need a single column as the determinant, simply pass one item in the `determinants` list.

Because the `where` clause uses the standard [dbt `config`](https://docs.getdbt.com/reference/configs-and-properties) pattern, you can further customize the scope of rows evaluated by this test (e.g., checking the dependency only for recent records).

----

### Grouping in tests
Expand Down
3 changes: 2 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
version: "3.7"
services:
postgres:
image: cimg/postgres:9.6
image: cimg/postgres:13.19
environment:
- POSTGRES_USER=root
- POSTGRES_DB=dbt_utils_test
ports:
- "5432:5432"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
idx,col_a,col_b,col_y
1,1,1,X
2,1,1,Y
3,2,1,X
4,2,1,X
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
col_a,col_b,col_y
1,1,X
1,2,X
2,1,X
2,2,Y
19 changes: 19 additions & 0 deletions integration_tests/models/generic_tests/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,24 @@ seeds:
error_if: "<1" #sneaky way to ensure that the test is returning failing rows
warn_if: "<0"

- name: data_test_functional_dependency_pass
data_tests:
- dbt_utils.functional_dependency:
determinants:
- col_a
- col_b
dependent: col_y

- name: data_test_functional_dependency_fail
data_tests:
- dbt_utils.functional_dependency:
determinants:
- col_a
- col_b
dependent: col_y
error_if: "<1" #sneaky way to ensure that the test is returning failing rows
warn_if: "<0"

models:
- name: recency_time_included
data_tests:
Expand Down Expand Up @@ -261,3 +279,4 @@ models:
compare_model: ref('data_test_equality_a')
exclude_columns:
- col_c

32 changes: 32 additions & 0 deletions macros/generic_tests/functional_dependency.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{% test functional_dependency(model, determinants, dependent, where_clause=None) %}
{{ return(adapter.dispatch('test_functional_dependency', 'dbt_utils')(model, determinants, dependent, where_clause)) }}
{% endtest %}

{% macro default__test_functional_dependency(model, determinants, dependent, where_clause=None) %}

with filtered as (
select *
from {{ model }}
{% if where_clause %}
where {{ where_clause }}
{% endif %}
),

violations as (
select
{% for col in determinants %}
{{ col }}{% if not loop.last %}, {% endif %}
{% endfor %},
count(distinct {{ dependent }}) as distinct_dependent_count
from filtered
group by
{% for col in determinants %}
{{ col }}{% if not loop.last %}, {% endif %}
{% endfor %}
having count(distinct {{ dependent }}) > 1
)

select *
from violations

{% endmacro %}
Loading