Skip to content

My solution for the Kaggle competition New York City Taxi Trip Duration. This solution uses the gradient boosted Decision Tree library LightGBM and ranked 49 out of 1257 (Top 4%).

Notifications You must be signed in to change notification settings

pklauke/Kaggle-NYCTaxi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Kaggle-NYCTaxi

This repository contains my solution for the Kaggle competition New York City Taxi Trip Duration. This solution ranked 49 out of 1257.

The goal of this competition was to predict the trip duration of taxi trips in New York City in the first half of 2016. My solution focussed mainly on the feature engineering part.

The exploratoy data analysis, the feature engineering and the model fitting were all done in the notebook Predictor. Important given features were the timestamp, pickup and dropoff coordinates and the vendor id of a respective taxi trip.

Some of the engineered features were features extracted from the timestamp. Many others were created by the pickup and dropoff coordinates. E.g. geospatial and temporal aggregations using the clustering algorithm KMeans or using a principal component analysis. Quite uniquely I did some work to estimate the average speed trips will have on their (fastest route) streets at the respective time of day and day of week. This estimation was done using the fastest route dataset.

The algorithm used was the extremely fast gradient boosted library LightGBM. This algorithm was trained several times using different random seeds. The predictions of the several runs were averaged for the final submission.

About

My solution for the Kaggle competition New York City Taxi Trip Duration. This solution uses the gradient boosted Decision Tree library LightGBM and ranked 49 out of 1257 (Top 4%).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published