This project is designed to build an ETL data pipeline specifically for Brazilian e-commerce data, leveraging Apache Airflow for orchestration. The pipeline automates the following steps:
- Extracting raw data from MinIO, an object storage service.
- Modeling and transforming data.
- Loading the processed data into a PostgreSQL database.
- Serving the data using Grafana for data visualization and business intelligence purposes.
To run this project, you need to create a virtual environment and install neccesary libraries.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements
Set AIRFLOW_HOME to current directory:
export AIRFLOW_HOME=$(pwd)
Initialize Airflow project:
airflow db init
Create admin users:
airflow users create \
--username ... \
--firstname ... \
--lastname ... \
--role ... \
--email ...
Start AIRFLOW webserver:
airflow webserver -p 3030
Start AIRFLOW scheduler:
airflow scheduler
Start MinIO, Grafana and PostgreSQL containers
docker compose up -d
Go to http://localhost:9000 create MinIO bucket and upload data to.
DAG:
Serving:
Data Processing: Python
Database and Data Storage: PostgreSQL, MinIO
Ochestration: Airflow
Visualization: Grafana
Containerization: Docker