Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Help with an example of the materialization type clone #832

Open
johnson-jay-l opened this issue Oct 21, 2024 · 7 comments
Open

Help with an example of the materialization type clone #832

johnson-jay-l opened this issue Oct 21, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@johnson-jay-l
Copy link

johnson-jay-l commented Oct 21, 2024

Describe the bug

I would like to try using the materialization type of clone so that I can have a model that "publishes" tested data runs downstream from the actual model being tested. The clone should have the same table name as the upstream model but should be in a different database schema.

This materialization type is not documented and I can't figure out how to get it to work for my use case.

Code for the clone materialization: https://github.com/databricks/dbt-databricks/blob/main/dbt/include/databricks/macros/materializations/clone.sql

Steps To Reproduce

  • Need an example of how to create a model with the clone materialization type

Expected behavior

  • Docs are updated
  • Able to create a table in databricks via shallow clone, that is cloned from an upstream model

Screenshots and log output

n/a

System information

The output of dbt --version:

dbt@06f0d6eceaea:~$ dbt --version
Core:
  - installed: 1.8.6

  The latest version of dbt-core could not be determined!
  Make sure that the following URL is accessible:
  https://pypi.org/pypi/dbt-core/json

Plugins:
  - redshift:   1.8.1 - Up to date!
  - databricks: 1.8.5 - Update available!
  - spark:      1.8.0 - Up to date!
  - postgres:   1.8.2 - Up to date!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using:

The output of python --version:

dbt@06f0d6eceaea:~$ python --version
Python 3.9.20

Additional context

  • I am willing to raise a documentation PR if someone can help me out with an example
@johnson-jay-l johnson-jay-l added the bug Something isn't working label Oct 21, 2024
@johnson-jay-l
Copy link
Author

Here's what I've tried:

test_table.sql

{{
  config(
    materialized='table'
    )
}}

select 1 as col1
union
select 2 as col2

test_table_clone.sql

{{
  config(
    materialized='clone',
    this_relation='test_table',
    defer_relation='test_table',
    schema='bi'
    )
}}

dbt run --target databricks-dev-local --select test_table

04:36:00  ========== Begin Summary ==========
04:36:00  node: model.calm_main.test_table; status: success (error: )
04:36:00  ========== End Summary ==========
04:36:00  1 of 1 START hook: calm_main.on-run-end.0 ...................................... [RUN]
04:36:00  1 of 1 OK hook: calm_main.on-run-end.0 ......................................... [OK in 0.00s]
04:36:00  
04:36:01  
04:36:01  Finished running 1 table model, 1 project hook in 0 hours 0 minutes and 30.69 seconds (30.69s).
04:36:03  
04:36:03  Completed successfully
04:36:03  
04:36:03  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

dbt run --target databricks-dev-local --select test_table_clone

04:36:25  Found 1254 models, 40 snapshots, 23 seeds, 1 operation, 2795 data tests, 1773 sources, 1050 macros
04:36:25  
04:36:34  Concurrency: 4 threads (target='databricks-dev-local')
04:36:34  
04:36:34  1 of 1 START sql clone model sandbox.test_table_clone .................. [RUN]
04:36:34  No relation found in state manifest for model.calm_main.test_table_clone
04:36:34  1 of 1 ERROR creating sql clone model sandbox.test_table_clone ......... [ERROR in 0.04s]

Question - why is it looking for the state manifest? Is there a way to ignore the state manifest and create the clone as long as the upstream task was successful? We often have downstream models running in separate container environments that don't have awareness of the upstream models' state manifests because the success/failure state is managed outside of dbt.

@benc-db
Copy link
Collaborator

benc-db commented Oct 21, 2024

I'm reasonably certain that we don't actually support clone as a materialization, but as support for dbt clone

@johnson-jay-l
Copy link
Author

In theory it seems like a custom materialization could work that uses cloning. I got pretty far with this example below based on table.sql and then calling clone() from a model.sql file instead of using the standard config block.

Would this general direction potentially work as a model materialization type? Or, what would it take to make the clone.sql logic work with models and not just the CLI? Looking for guidance on which direction makes the most sense.

{% macro clone(source_model_name, source_schema, schema, tags, pre_hook, post_hook) -%}
    {%- set source_model_name = source_model_name -%}
    {%- set source_schema = source_schema -%}
    {%- set schema = schema -%}
    {%- set tags = tags | default([]) -%}
    {%- set pre_hook = pre_hook | default('') -%}
    {%- set post_hook = post_hook | default('') -%}

    {{
      config(
            materialized = 'databricks_clone',
            source_model_name = source_model_name,
            source_schema = source_schema,
            schema = schema,
            tags = tags,
            pre_hook = pre_hook,
            post_hook = post_hook
      )
    }}
    -- depends_on: {{ ref(source_model_name) }}

{%- endmacro %}

{% materialization databricks_clone, adapter = 'databricks' %}
  {{ log("MATERIALIZING CLONE") }}
  {%- set source_model_name = config.get('source_model_name') -%}
  {%- set source_schema = config.get('source_schema') -%}
  {%- set target_schema = schema -%}
  {%- set grant_config = config.get('grants') -%}
  {%- set tags = config.get('databricks_tags') -%}

  {{ log("database: " ~ database, info=True) }}
  {{ log("source_model_name: " ~ source_model_name, info=True) }}
  {{ log("source_schema: " ~ source_schema, info=True) }}
  {{ log("target_schema: " ~ schema, info=True) }}

  {%- set target_relation = adapter.get_relation(database=database, schema=target_schema, identifier=source_model_name, needs_information=True) -%}
  {{ log("target_relation: " ~ target_relation, info=True) }}

  {{ log(target_relation, info=True) }}

  {%- set source_relation = adapter.get_relation(database=database, schema=source_schema, identifier=source_model_name, needs_information=True) -%}

  {{ log("source_relation: " ~ source_relation, info=True) }}

  {{ run_hooks(pre_hooks) }}

  -- build model

  {%- call statement('main', language='sql') -%}
    create or replace
    table {{ target_relation }}
    shallow clone {{ source_relation }}
  {%- endcall -%}

  {% set should_revoke = should_revoke(old_relation, full_refresh_mode=True) %}
  {% do apply_grants(target_relation, grant_config, should_revoke) %}
  {%- do apply_tags(target_relation, tags) -%}

  {% do persist_docs(target_relation, model, for_relation=language=='python') %}

  {% do persist_constraints(target_relation, model) %}

  {% do optimize(target_relation) %}

  {{ run_hooks(post_hooks) }}

  {{ return({'relations': [target_relation]})}}

{% endmaterialization %}

@benc-db
Copy link
Collaborator

benc-db commented Oct 21, 2024

First we'd have to remove the blockers to custom materializations (another open ticket) :/. I suggest raising the issue of clone as a supported materialization with dbt Labs, because your use case seems reasonable to me.

@johnson-jay-l
Copy link
Author

I'm new to custom materializations and was going off this doc which implies that custom materializations are supported: https://docs.getdbt.com/guides/create-new-materializations?step=1

Could you elaborate on the blockers? Happy to open another ticket and/or contribute to the code but I need to better understand what I should be asking for vs what is already supported.

@johnson-jay-l
Copy link
Author

I went ahead and created the dbt-core issue and linked it to this one

@benc-db
Copy link
Collaborator

benc-db commented Oct 21, 2024

There are places in dbt-databricks that explicitly enumerate supported materializations that we need to remove.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants