Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add backquotes to columns when executing dbt unit tests #868

Open
tshiomoto opened this issue Dec 10, 2024 · 6 comments
Open

Add backquotes to columns when executing dbt unit tests #868

tshiomoto opened this issue Dec 10, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@tshiomoto
Copy link

Describe the bug

I created using code like the following (Note that I used Japanese, that is non-ASCII character, for column names):

SELECT
  `Japanese column name1`,
  `Japanese column name2`,
  ...
FROM
  {{ source(schema_name, source_table_name) }}

I defined the schema like the following, and created the csv file for this unit test:

version: 2

sources:
  - name: schema_name
    tables:
      - name: table name

models:
  - name: target_table_name

unit_tests:
  - name: test_name
    model: target_table_name
    given:
      - input: source(schema_name, source_table_name)
        format: csv
        fixture: source_table_input
    expect: 
      format: csv
      fixture: target_table_output

After that, when I executed dbt run command, dbt runs failed with this error:

[INVALID_IDENTIFIER] The unquoted identifier __dbt__cte__target_table_name is invalid and must be back quoted as: `__dbt__cte__target_table_name`.
    Unquoted identifiers can only contain ASCII letters ('a' - 'z', 'A' - 'Z'), digits ('0' - '9'), and underbar ('_').
    Unquoted identifiers must also not start with a digit.
    Different data sources and meta stores may impose additional restrictions on valid identifiers. SQLSTATE: 42602 (line 7, pos 13)

I would appreciate you when you could tell me how I can solve this problem. However, I suppose that it's easiest way to change column names to english but I can't do it that way.

System information

The output of dbt --version:

Core:
  - installed: 1.8.6   
  - latest:    1.9.0 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.8.6 - Update available!
  - spark:      1.8.0 - Up to date!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using :windows 10 (22H2)

The output of python --version:python 3.9.19

Additional context

I found the similar issue. (#859)

@tshiomoto tshiomoto added the bug Something isn't working label Dec 10, 2024
@benc-db
Copy link
Collaborator

benc-db commented Dec 10, 2024

So, I think technically you are supposed to set this on the columns: https://docs.getdbt.com/reference/resource-properties/quote

However, I'm more and more feeling like we should just always backtick column names. @mikealfare, what are your thoughts on this?

@tshiomoto
Copy link
Author

Oh, thank you for giving me that URL! I didn't know that page existed. After setting this config, I try to execute test again.

@tshiomoto
Copy link
Author

I tried to execute test after setting it, but the same error occurred.
I fixed schema.yml like the following:

version: 2

sources:
  - name: schema_name
    tables:
      - name: table name
        columns:
        - name: Japanese column name1
          quote: true
        - name: Japanese column name2
          quote: true
        ...

models:
  - name: target_table_name

unit_tests:
  - name: test_name
    model: target_table_name
    given:
      - input: source(schema_name, source_table_name)
        format: csv
        fixture: source_table_input
    expect: 
      format: csv
      fixture: target_table_output

@benc-db
Copy link
Collaborator

benc-db commented Dec 11, 2024

Thanks for reporting and following up. I'm thinking I'll move towards just always quoting, since there are places where I may have access to column names but not the 'quote' information anymore. I'll add a behavior flag for this for 1.9.1.

@tshiomoto
Copy link
Author

Thank you for your response. I hope that a fixed version will be released.

@benc-db
Copy link
Collaborator

benc-db commented Dec 13, 2024

It doesn't look like I'll have this finished in time for 1.9.1, as my attempts to always backtick columns broke a bunch of tests. I'm still prioritizing this, but I don't think it will be ready until January. Apologies for any inconvenience.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants