Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 1.86 KB

README.md

File metadata and controls

24 lines (17 loc) · 1.86 KB

Building Data Pipeline On Blood Donation Data in Malaysia

Description

This portfolio project is a practical exploration of the world of Data Engineering and Data Science. It involves the construction of an end-to-end data pipeline that ingests, analyzes, and loads daily updated blood donation data into a locally created database. The primary objective is to gain hands-on experience and a deep understanding of the intricacies involved in data pipeline creation and management.

Dataset

The dataset used for this project is publicly available and can be found here.

Current Developments

  • Developed an efficient Extract-Transform-Load (ETL) pipeline using Python, ensuring seamless data ingestion and processing.
  • Implemented a Telegram bot using Python to deliver automated analysis, providing real-time insights directly from the data.
  • Established and managed a local database using XAMPP, ensuring data integrity and accessibility for storing and updating blood donation data.

Future Planning

  • Plan to further enhance the efficiency of the data pipeline by automating the ETL process using Python, which will streamline data updates to the database.
  • Exploring cloud computing options, specifically Google Cloud Platform or AWS, to run the script on a virtual machine in the cloud. This will ensure the scalability and reliability of the data pipeline and allow for more robust data handling.

Technologies Used

  • Python: For scripting and data analysis
  • XAMPP: For creating and managing the local database
  • Telegram Bot: For delivering automated analysis

This project showcases my ability to build and manage data pipelines, automate data analysis, and explore advanced data engineering concepts like cloud computing. It reflects my commitment to continuous learning and applying data science to real-world scenarios.