Skip to content

Repository containing the notebook for my big data project involving EDA and Machine Learning on the NY Taxi Fare dataset.

Notifications You must be signed in to change notification settings

kaushikrohit004/NYC-Taxi-Demand-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

NYC-Taxi-Demand-Prediction

A repository to contain the notebook for my big data project involving EDA and Machine Learning on the NY Taxi Fare dataset

Motivation, Challenge & Accomplishment

  • Motivation: Building and deploying a data science project with cloud technologies such as Apache Spark and AWS. After having enough theoretical knowledge of data science workflow and machine learning techniques and implementing few projects locally, I thought of going further and working out a workflow typically used in the industry for development and deployment (that is cloud operations) and searched a big enough dataset for it.

  • Challenge: Get up and started with Spark and its Python wrapper, PySpark as well as managing clusters on AWS EMR, none of which I had done earlier. This also started my attempt at completing one data science project each month starting with this for October.

  • Accomplishment: Successfully performing EDA and feature engineering on the dataset and using the Spark MLlib to build RF and DT models and achieve an RMSE error of 4.28.


Tech Stack

  • Python 3
  • pandas
  • matplotlib
  • seaborn
  • PySpark
  • AWS EMR
  • AWS EC2
  • Spark MLlib

About

Repository containing the notebook for my big data project involving EDA and Machine Learning on the NY Taxi Fare dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published