NYC-Taxi-Demand-Prediction

Motivation: Building and deploying a data science project with cloud technologies such as Apache Spark and AWS. After having enough theoretical knowledge of data science workflow and machine learning techniques and implementing few projects locally, I thought of going further and working out a workflow typically used in the industry for development and deployment (that is cloud operations) and searched a big enough dataset for it.
Challenge: Get up and started with Spark and its Python wrapper, PySpark as well as managing clusters on AWS EMR, none of which I had done earlier. This also started my attempt at completing one data science project each month starting with this for October.
Accomplishment: Successfully performing EDA and feature engineering on the dataset and using the Spark MLlib to build RF and DT models and achieve an RMSE error of 4.28.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
NY Taxi PySpark.ipynb		NY Taxi PySpark.ipynb
README.md		README.md

Provide feedback