This repo consists of a synthetically generated dataset, along with the config for producing that dataset, and a working dbt project for the data. This data is intended to power the "Thyme to Shine Market" as part of the Lightdash demo.
- Synth for time-series synthetic data generation.
- Python for some data transformation using
pandas
andnumpy
. - google-cloud-sdk for loading to bigquery.
- dbt for data transformations as config.
Note that you shouldn't need to do this very often! The data should be kept up to data with some configuration in the yml files. That means that the data is updated at run time in Lightdash, so we can keep the underlying objects as table to enable good performance and low cost.
sh seed-bigquery.sh
This will:
- use
synth
to generate all the base datasets using the synth configuration defined insynth/
, with help from the lookups define inlookup/
. Note that the final two commands are non-deterministic. - run some bespoke transformations (using Python
pandas
andnumpy
) on the data to improve its suitability for a demo. Note that this script is non-deterministic since it includes some randomness (e.g. to adjust product popularity).
You will need to manually adjust the output data. It's important that you do the following steps:
- Make sure all CSV files have data that runs to at least the current date or slightly beyond - if they don't you'll need to readjust your settings and generate new data.
- Manually remove rows that have dates beyond the current date.
- Make sure to adjust the .yml file date configurations. For each date field, you need to adjust the sql block to specify the maximum timestamp in the given table.
Contact a member of the team to get the correct configuration for your profiles.yml
file. You will need two different targets to update both the internal and external demo sites.
Running dbt build will take the data that exists in the CSVs and create objects in BigQuery, that are then used in the dbt models that reference them.
Tables will be built in lightdash-analytics.lightdash_demo_gardening
cd dbt-bigquery
dbt build
Now the data in the internal version of Thyme to Shine Market will reflect the new data from the generated CSVs.
Now that the internal site is updated, you need to update the externally facing demo site.
Run the same commands as above, but this time using --target demo
which will updates the lightdash-healthcare-demo.lightdash_gardening_demo
dataset.