Skip to content

Analyze and visualize ride-sharing data using Python, Pandas, and Matplotlib.

Notifications You must be signed in to change notification settings

namu12345/PyBer_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

PyBer_Analysis :

The Pyber_Analysis is analysis of hypothetical ride-sharing app company "Pyber". In this project I'll be helping out Omar(manager of Pyber) to analyze all the rideshare data from January to early May of 2019 and create a compelling visualization for the CEO, V. Isualize.

Purpose of Pyber_Analysis :

The purpose of our analysis is to create a summary dataframe that will show ride sharing data by city type(Rural,Urban & Suburban). Once we find our data we are going to create a multiple line graph that shows total weekly fares by each city type. How we first pulled our data is by using the pandas Groupby() function with the count() and sum () to get the total number of drivers,rides and fares by city type. Once we pulled this information and assigned it to functions we were able to calculate our average fare per ride and driver. Once we had all of that information together we were able to format into a newdata frame and re-format the columns. We further created the Pivot Table to get the total fares for each type of city by the date. And then finally we used our resample function to get the weekly sum of fares of each city. And our last section was to create visualization data in the form of line chart for CEO V.Isualize.

Results :

As mentioned earlier we started our analysis by using the Pandas groupby() function with the count() and sum() methods on PyBer DataFrame columns to get the total number of rides, total number of drivers, and the total fares for each city type. Then we calculated the average fare per ride and average fare per driver for each city type. Finally, we added the data to a new DataFrame wherein we did formatted our columns to have better understanding.

  • The results are as below :

image image

  • And after cleaning & formatting the dataframe it looks like :

image

Further in analysis we used our Pandas skills and also used two new functions, pivot() and resample() to create a multiple-line graph that showed the total fares for each week by city type. To work on Pivot and resampling we started with grouping our data by using groupby function on the "type" and "date" columns of the pyber_data_df DataFrame, then applied the sum() method on the "fare" column to show the total fare amount for each date.

  • This resulted in the following way :

image

  • Now we had to reset our index on the DataFrame which we created right in starting. This is needed to use the 'pivot()' function later in our analysis. This is how i reset my index :

image

  • After reseting the index we created the Pivot table by using pivot function which resulted as below :

image

  • Since we are looking for the data from January to early May 2019 we created new pivot table by using the loc method on the date range from 2019-01-01 through 2019-04-28. And it resulted in below image :

image

  • Our final step is to Resample the data in weekly bins to get the total fares for each week. For resampling, first we have to set the "date" index to datetime datatype. So following is the results of both changing the datatype of date column and resample the data to see total fares for each week:

image

  • Resample :

image

The last section of our analysis is to visualize our data in graphical way. This is done by creating Line graph showing the fare for each city type from Jan to April 2019.

image

Summary :

To summarize Pyber analysis I noticed following differences :

  • Looking at Pyber summary dataframe as shown in above section it is noticed that Rural has very less drivers i.e 78 drivers as compared to 490 & 2405 for Suburban and urban resp.
  • So Urban cities have 4x+ more drivers than suburban cities. Suburban cities have 6x + more drivers than rural with almost 4.5x the revenue.
  • The average fare per driver is way too high for Rural i.e.$55.49 as compared to Urban has $16.57 average fare per driver.
  • Drivers in rural cities are earning more than drivers in urban cities. This could discourage potential drivers from working with PyBer given the low average fare per driver.
  • In conclusion we can effectively say that a rural area will command a higher fare because there are fewer workers that will come to this area, the travel time and distance is most likely longer making the average fare per ride & driver the most out of all city types.

Based on the analysis my business recommendations to Pyber are: Increasing the amount of drivers in Rural areas to ensure there are enough drivers to meet ride demand. Data for rural cities shows that the average fare per ride and average fare per driver is much higher than Suburban and urban cities.This can indicate that rural area based riders are taking trips over a longer distance. This can result in a majority of drivers being occupied with current trips and loss in potential revenue when there are peaks in business.

About

Analyze and visualize ride-sharing data using Python, Pandas, and Matplotlib.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published