Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug] Snapshots defined in yaml fail if yaml file exists in target/run directory #11321

Open
2 tasks done
joshuanits opened this issue Feb 19, 2025 · 1 comment · May be fixed by #11323
Open
2 tasks done

[Bug] Snapshots defined in yaml fail if yaml file exists in target/run directory #11321

joshuanits opened this issue Feb 19, 2025 · 1 comment · May be fixed by #11323
Labels
bug Something isn't working triage

Comments

@joshuanits
Copy link

joshuanits commented Feb 19, 2025

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Snapshots defined in a yaml file such as dbt_project/models/schema.yml fail to build/run if a file dbt_project/target/run/dbt_project/models/schema.yml exists:

Unhandled error while executing target/run/dbt_project/models/schema.yml/schema.yml/snapshot.sql
[Errno 20] Not a directory: 'dbt_project/target/run/dbt_project/models/schema.yml/schema.yml'

I'm not sure exactly what causes the schema.yml to end up in the target directory, but it has happened multiple times.

Expected Behavior

  • Replace . with _ in folder names, i.e. target/run/models/schema_yml/... - ideal because it's clearer than having folders with file extensions or
  • Check that item in target/run path is folder and handle gracefully (i.e. remove)

Steps To Reproduce

# dbt_project.yml
name: 'dbt_project'

profile: 'dbt_project'

model-paths: ["models"]
snapshot-paths: ["snapshots"]
# models/schema.yml
snapshots:
  - name: 'snapshot'
    relation: ref('model')
    config:
      strategy: 'check'
      unique_key: 'col'
      check_cols: 'all'
-- models/model.sql
SELECT 1 as col
dbt build --select model
touch target/run/dbt_project/models/schema.yml
dbt build --select snapshot

Relevant log output

$ dbt build --select snapshot
07:15:25  Running with dbt=1.9.2
07:15:25  Registered adapter: duckdb=1.9.2
07:15:25  Found 1 model, 1 snapshot, 426 macros
07:15:25  
07:15:25  Concurrency: 1 threads (target='dev')
07:15:25  
07:15:26  1 of 1 START snapshot main.snapshot ............................................ [RUN]
07:15:26  Unhandled error while executing target/run/dbt_project/models/schema.yml/schema.yml/snapshot.sql
[Errno 20] Not a directory: '~/dbt_project/target/run/dbt_project/models/schema.yml/schema.yml'
07:15:26  1 of 1 ERROR snapshotting main.snapshot ........................................ [ERROR in 0.25s]
07:15:26  
07:15:26  Finished running 1 snapshot in 0 hours 0 minutes and 0.43 seconds (0.43s).
07:15:26  
07:15:26  Completed with 1 error, 0 partial successes, and 0 warnings:
07:15:26  
07:15:26    [Errno 20] Not a directory: '~/dbt_project/target/run/dbt_project/models/schema.yml/schema.yml'
07:15:26  
07:15:26  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Environment

- OS: Ubuntu 22.04.5
- Python: 3.12.7
- dbt: 1.9.2
- dbt-duckdb: 1.9.2

Which database adapter are you using with dbt?

Reproduced with duckdb and snowflake

Additional Context

If not already known - I'll look at fixing myself and making PR

@joshuanits joshuanits added bug Something isn't working triage labels Feb 19, 2025
@joshuanits
Copy link
Author

For these yml models ParsedNode.get_target_write_path() is running with self.original_file_path = "models/schema.yml" and self.path = "schema.yml/snapshot.sql", which are joined to get the path models/schema.yml/schema.yml/snapshot.sql.

def get_target_write_path(
self, target_path: str, subdirectory: str, split_suffix: Optional[str] = None
):
# This is called for both the "compiled" subdirectory of "target" and the "run" subdirectory
if os.path.basename(self.path) == os.path.basename(self.original_file_path):
# One-to-one relationship of nodes to files.
path = self.original_file_path
else:
# Many-to-one relationship of nodes to files.
path = os.path.join(self.original_file_path, self.path)
if split_suffix:
pathlib_path = Path(path)
path = str(
pathlib_path.parent
/ pathlib_path.stem
/ (pathlib_path.stem + f"_{split_suffix}" + pathlib_path.suffix)
)
target_write_path = os.path.join(target_path, subdirectory, self.package_name, path)
return target_write_path

This seems pretty inelegant, a path such as models/schema_yml/snapshot.sql would avoid having directories that look like and might collide with files and remove the extra nesting.

Adding this elif does this - although it probably needs to be more robust.

if os.path.basename(self.path) == os.path.basename(self.original_file_path):
    # One-to-one relationship of nodes to files.
    path = self.original_file_path
elif os.path.dirname(self.path) == os.path.basename(self.original_file_path):
    parent_dirname = os.path.dirname(self.original_file_path)
    dirname = os.path.dirname(self.path).replace(".", "_")
    basename = os.path.basename(self.path)
    path = os.path.join(parent_dirname, dirname, basename)
else:
    #  Many-to-one relationship of nodes to files.
    path = os.path.join(self.original_file_path, self.path)

The resultant structure is much easier to understand:

.
├── dbt_project.yml
├── models
│   ├── model.sql
│   └── schema.yml
└── target
    └── run
        └── dbt_project
            └── models
                ├── model.sql
                └── schema_yml
                    └── snapshot.sql

@joshuanits joshuanits linked a pull request Feb 19, 2025 that will close this issue
5 tasks
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant