Skip to content

A data engineering project with Dbt, Snowflake, Kafka, Spark, Docker, Airflow and much more!

Notifications You must be signed in to change notification settings

thangmaster37/Bank-ETL-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bank-ETL-Pipeline

🚀 1.DESCRIPTION

In this project, I build a data pipeline with data sent in real time and use DBT to process then store in Snowflake.

🧐 2.ARCHITECTURE

UI

⭐️ 3.DATA WAREHOUSE

UI

🔥 4.SET UP AND RUN

Due to hardware limitations in my computer, in this project I built kafka and airflow at the same time. First run move to the "airflow" directory and run:

docker-compose up -d --build

After running successfully, kafka and Airflow have been built on docker. Next, we have to build the dependencies on Snowflake as follows:

  • Move to the "scripts" folder and run: python run_scripts.py

Now let's start running Kafka and Airflow to send data to Snowflake and transform the data with Dbt:

  • Run Apache Kafka:

    • Move to the "airflow" directory and run: docker-compose up
    • Move to the "spark streaming/connection" directory and run: python consumer_bank.py
    • Move to the "kafka/connection" directory and run: python producer_bank.py
  • Run Apache Airflow:

    • Move to the "airflow" directory and run: docker-compose up

✅ 5.FINAL RESULT

  • Data pipeline for my project

UI

  • Lineage graph in Dbt

UI

🚨 6.CONCLUSION

Basically, in this project I want to focus mainly on using Dbt for data transformation because nowadays Dbt is gradually becoming a powerful tool in data processing with SQL statements.

About

A data engineering project with Dbt, Snowflake, Kafka, Spark, Docker, Airflow and much more!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published