Skip to content

get-select/TPCH-dbt-benchmark

 
 

Repository files navigation

Are you here for Snowflake Gen 2 Warehouses?

Decoding the blog article:

  • dbt build means building this project wide open.
  • Other queries mentioned in the blog are found in the analysis folder.
  • Create warehouse scripts are also found in analysis folder.
  • note: analysis files are missing .sql extension so dbt does not parse them.

This repo was forked from https://github.com/clausherther/dbt-tpch and edited so it runs with modern version of dbt.

His readme is below


Original Readme

This is a dbt sample project for Snowflake using the TPC-H example dataset that ships as a shared database with Snowflake.

More details can be found on the TPC website and in the specification document.

The project is laid out as follows:

  • _schema contains schema defintions and tests in one .yml file per model
  • _source contains source definitions
  • base contains ephemeral base models that serve as wrappers around source models to define column names and data types where necessary
  • ods represents an Operational Data Store (ODS), i.e. a mostly normalized view of the data. These models may contain more columns than we may choose to publish to the dimensional data warehouse, but don't contain any reporting models.
  • wh represents the Dimensional Data Warehouse (WH). These models use a star schema methodology made up of fact (fct_*) and dimension (dim_*) tables. In addition, this schema contains report models (rpt_*) that combine fact and dimension tables for business reporting.

Profile

  • Add a new profile to ~/.dbt/profiles.yml called tpch.
tpch:
    target: dev
    outputs:
        prod:
            type: snowflake
            threads: 8
            account: <account>
            user: <user>
            password: <password>
            role: <role>
            database: <target_database>
            warehouse: <snowflake_warehouse>
            schema: <default_schema>

        dev:
            type: snowflake
            threads: 8
            account: <account>
            user: <user>
            password: <password>
            role: <role>
            database: <target_database>
            warehouse: <snowflake_warehouse>
            schema: <default_schema>

Packages

This project make use of the dbt_utils package, so you will need to call dbt deps before running any model to ensure dbt can combile all package macros.

Scaling Factor

Also, note that you can change the scaling factor of the TPCH dataset by switching the source database in _source/source_tpch.yml from the default of 10 to 100 or 1000 by changing the database name accordingly.

version: 2

sources:
  - name: tpch
    database: SNOWFLAKE_SAMPLE_DATA
    schema: TPCH_SF10
    loader: Snowflake

...

Snowflake Usage

Using an X-Small warehouse (1 credit / hour), the project currently runs in about 5 minutes against the TPCH_SF10 database.



About

This repo can be used to benchmark performance in Snowflake

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published