Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add include_data_types argument to generate_model_yaml macro #122

Merged
merged 26 commits into from
Sep 25, 2023

Conversation

linbug
Copy link
Contributor

@linbug linbug commented Apr 7, 2023

resolves dbt-labs/dbt-core#120

Also:

  • updates the default include_data_types value for generate_source to True
  • lowers returned data types for generate_source and generate_model_yaml

This is a:

  • documentation update
  • bug fix with no breaking changes
  • new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

Checklist

  • This code is associated with an issue which has been triaged and accepted for development.
  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@linbug linbug changed the title Add column data types Add include_data_types argument to generate_model_yaml macro Apr 8, 2023
@linbug linbug marked this pull request as ready for review April 14, 2023 16:58
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linbug Thank you for picking this up! This will be a great addition for folks who start using model contracts in v1.5

{% if parent_column_name %}
{% set column_name = parent_column_name ~ "." ~ column.name %}
{% else %}
{% set column_name = column.name %}
{% endif %}

{% do model_yaml.append(' - name: ' ~ column_name | lower ) %}
{% if include_data_types %}
{% do model_yaml.append(' data_type: ' ~ column.data_type | lower) %}
Copy link
Contributor

@jtcohen6 jtcohen6 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the adapter / data platform, we may want this to be column.dtype instead of column.data_type — I believe that will enable this to return text or varchar instead of character varying(1234).

We could use the dbt.format_column macro, which is also what's used when doing the column type assertion for contracted models.

{% set formatted = adapter.dispatch('format_column', 'dbt')(column) %}
{% do model_yaml.append('        data_type: ' ~ formatted['data_type'] | lower) %}

If we do make this change, the format_column macro is new in v1.5, so this would require a version bump to codegen and to its require-dbt-version

Copy link
Contributor

@dbeatty10 dbeatty10 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 good point about column.dtype vs. column.data_type!

Using dbt.format_column macro

I like your idea of using the logic within the dbt.format_column macro. I think we should aim to preserve the wide range of require-dbt-version: [">=1.0.0", "<2.0.0"] though.

There's multiple approaches we could take to use this logic into codegen without forcing an upgrade to 1.5.0. The quick'n'dirty option is to "vendor" it by just copy-pasting it into codegen. Another option is to inspect the dbt version, and fall back to a vendored version of the macro if it is less than 1.5.0.

Difference between dtype and data_type

My understanding of the difference between the two is dtype gives the database-specific data type with no size, scale, or precision (like varchar or decimal) whereas data_type is intended to give the database-specific data type with those included (like varchar(80) or decimal(18, 2)).

Although dbt model contracts can operate with either input, it will effectively ignore any size/precision/scale that is supplied.

So best would be to exclude size/precision/scale to avoid any false impressions that it will be verified as part of the dbt model contract.

💡 We should fix column.data_type so its value is always valid. It feels very doable to make it work across adapters, and relevant issues opened here, here, and here.

💡 Once column.data_type is fixed, we should consider expanding dbt.format_column() to include it as well.

Copy link
Contributor Author

@linbug linbug Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I added this which I feel should work, but for some reason my tests fail locally because formatted is empty. My dbt version is 1.5.0-b5 which is maybe pre-release and doesn't have dbt.format_columns yet? If you have suggestions for getting around this please let me know, otherwise I can use the vendored version for everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbeatty10 @jtcohen6 any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dave-connors-3 @benmosher we may want to get this across the line now that generate model is live in the IDE. would be a good improvement to the functionality.

@@ -62,7 +62,7 @@
{% for column in columns %}
{% do sources_yaml.append(' - name: ' ~ column.name | lower ) %}
{% if include_data_types %}
{% do sources_yaml.append(' data_type: ' ~ (column.data_type | upper ) ) %}
{% do sources_yaml.append(' data_type: ' ~ (column.data_type | lower ) ) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, we may want to use .dtype instead of .data_type, depending on the data platform — and the format_column macro (new in v1.5) could be our friend here.

@linbug linbug requested review from jtcohen6 and dbeatty10 May 3, 2023 22:00
@xrpza
Copy link

xrpza commented Jul 20, 2023

Are you looking for additional contributors on this to help keep it moving? Happy to help however I can but obviously don't want to inject myself where I shouldn't.

@dave-connors-3
Copy link
Contributor

@linbug -- this looks great -- if you are able to resolve the conflicts, we can do a final review and release on this!

@jenna-jordan
Copy link

@linbug -- this looks great -- if you are able to resolve the conflicts, we can do a final review and release on this!

Jumping in as someone who has been watching the PR for over a month now - I'm so excited that this is finally going to be merged in! This is a feature I've been waiting for before our team implements contracts, thank you all so much for working on this!

@thomas-vl
Copy link

@linbug do you need any help getting this PR over the finish line? If you don't have time I would love to help!

@dbeatty10
Copy link
Contributor

I'm going to take a shot at resolving the merge conflicts and then do final review.

@gwenwindflower
Copy link
Contributor

@dbeatty10 you are a champion of life

@thomas-vl
Copy link

I'm going to take a shot at resolving the merge conflicts and then do final review.

Let me know if you need help.

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for contributing this @linbug!

My apologies to you and everyone else that have been waiting for this to be merged 🙏

I pushed two main changes:

  1. b2f463c - rely only on the "vendored" version of format_column (rather than using dbt.default__format_column)
    • comparing dbt versions robustly is more trouble than it was worth in this case, so this change greatly simplified things
  2. 6f6aea6 - allow end users to configure their own data type formatting by using multiple dispatch to override the data_type_format_source and/or data_type_format_model macros.

@dbeatty10 dbeatty10 merged commit da3731b into dbt-labs:main Sep 25, 2023
@linbug
Copy link
Contributor Author

linbug commented Oct 3, 2023

Apologies for missing the pings about resolving conflicts. Thank you for merging this!

@dbeatty10
Copy link
Contributor

No prob @linbug -- really appreciate the great work you did here! 🏆

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants