This portfolio project is a practical exploration of the world of Data Engineering and Data Science. It involves the construction of an end-to-end data pipeline that ingests, analyzes, and loads daily updated blood donation data into a locally created database. The primary objective is to gain hands-on experience and a deep understanding of the intricacies involved in data pipeline creation and management.
The dataset used for this project is publicly available and can be found here.
- Developed an efficient Extract-Transform-Load (ETL) pipeline using Python, ensuring seamless data ingestion and processing.
- Implemented a Telegram bot using Python to deliver automated analysis, providing real-time insights directly from the data.
- Established and managed a local database using XAMPP, ensuring data integrity and accessibility for storing and updating blood donation data.
- Plan to further enhance the efficiency of the data pipeline by automating the ETL process using Python, which will streamline data updates to the database.
- Exploring cloud computing options, specifically Google Cloud Platform or AWS, to run the script on a virtual machine in the cloud. This will ensure the scalability and reliability of the data pipeline and allow for more robust data handling.
- Python: For scripting and data analysis
- XAMPP: For creating and managing the local database
- Telegram Bot: For delivering automated analysis
This project showcases my ability to build and manage data pipelines, automate data analysis, and explore advanced data engineering concepts like cloud computing. It reflects my commitment to continuous learning and applying data science to real-world scenarios.