Skip to content

huyvu1404/etl_airflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL Data Pipeline with Airflow

image

This project is designed to build an ETL data pipeline specifically for Brazilian e-commerce data, leveraging Apache Airflow for orchestration. The pipeline automates the following steps:

  1. Extracting raw data from MinIO, an object storage service.
  2. Modeling and transforming data.
  3. Loading the processed data into a PostgreSQL database.
  4. Serving the data using Grafana for data visualization and business intelligence purposes.

Deployment

To run this project, you need to create a virtual environment and install neccesary libraries.

  python3 -m venv venv

  source venv/bin/activate

  pip install -r requirements

Set AIRFLOW_HOME to current directory:

  export AIRFLOW_HOME=$(pwd)

Initialize Airflow project:

  airflow db init

Create admin users:

  airflow users create \
         --username ... \
         --firstname  ... \
         --lastname ... \
         --role ... \
         --email ... 

Start AIRFLOW webserver:

  airflow webserver -p 3030

Start AIRFLOW scheduler:

  airflow scheduler

Start MinIO, Grafana and PostgreSQL containers

  docker compose up -d

Go to http://localhost:9000 create MinIO bucket and upload data to.

Demo

DAG:

image

Serving:

image

Tech Stack

Data Processing: Python

Database and Data Storage: PostgreSQL, MinIO

Ochestration: Airflow

Visualization: Grafana

Containerization: Docker

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages