-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathReport for Analysis of Bike Sharing Company - Cyclistic.Rmd
349 lines (246 loc) · 12.9 KB
/
Report for Analysis of Bike Sharing Company - Cyclistic.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
---
title: 'Analysis of Bike Sharing Company: Cyclistic'
author: "Wale Adio"
date: "2023-08-19"
output:
pdf_document:
toc: yes
html_document:
toc: yes
number_sections: yes
fig_width: 5
fig_height: 3.5
fig_caption: yes
df_print: tibble
---
```{r setup, include=FALSE}
library(tidyverse)
library(lubridate)
library(ggplot2)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE)
# Set the root directory for notebook chunks
knitr::opts_knit$set(root.dir = "/Users/Wale/Downloads/Projects/Google Analytics/Cyclistic/Divvy_Project")
```
# Introduction
## Background
Cyclistic offers a bike sharing program that has expanded to a fleet of 5,824 bicycles stationed across 692 station in Chicago. The company seeks to provide an eco-friendly and convenient means of transportation for both residents and visitors. However, the company aims to increase the subscription rate of its casual riders.
The aim of this article is to provide valuable insights and recommendations that could help retain and increase its casual riders.
## Problem Statement
The company has noticed a lower subscription rate among its casual riders compared to its annual members. The company would like to understand the factors contributing to this trend and come up with ways to encourage casual riders to subscribe.
# Data
## Data Description
For this project, historical trip data was obtained from Cyclistic. An analysis of trip data for the previous year will be conducted.
## Methodology
In this section, all packages that were installed and loaded for this project will be discussed. They are listed below:
* **Tidyverse**: This a collection of several R packages working together for data importation, manipulation, visualization and analysis. The packages that would be useful to this project include;
* **Dplyr**: This package is for data manipulation and provides functions like arrange, filter, mutate and summarize.
* **Readr**: This package is for reading flat files like CSV.
* **Stringr**: This package is for string manipulation, provides functions that allows for manipulation of text data.
* **Ggplot2**: This package allows for all visualizations.
* **Lubridate**: This package was installed to ease working with date and times. Allowing manipulation and formatting of date and time data.
```{r include=FALSE}
# Setting Directory
# First the directory needs to be established for data importation
setwd("/Users/Wale/Downloads/Projects/Google Analytics/Cyclistic/Divvy_Project")
```
```{r message=FALSE, warning=FALSE, include=FALSE}
#Importing Datasets
#The data set were imported, with the data set capturing bike sharing data from 2019 q2 to 2020 q1.
# Uploading csv files
q2_2019 <- read.csv("Divvy_Trips_2019_Q2.csv")
q3_2019 <- read.csv("Divvy_Trips_2019_Q3.csv")
q4_2019 <- read.csv("Divvy_Trips_2019_Q4.csv")
q1_2020 <- read.csv("Divvy_Trips_2020_Q1.csv")
# Viewing column names and structure
str(q2_2019)
colnames(q2_2019)
str(q3_2019)
colnames(q3_2019)
str(q4_2019)
colnames(q4_2019)
str(q1_2020)
colnames(q1_2020)
```
## Data Cleaning
The Tidyverse packages were utilized during data manipulation and wrangling. This section will detail how the data sets were processed prior to analysis.
### Data wrangling and Combining into a Single Data Frame
* **Renaming Columns**: Upon inspection, it was noticed column names in 2019 q2, q3 & q4 differed from 2020 q1. To ensure consistency, 2019 column names were renamed to match that of 2020.
* **Converting Datatypes**: The datatype of ride_id and rideable_type were changed to character to all datasets to be merged.
* **Merging Datasets**: Datasets for the respective quarters were merged into one dataset.
* **Removing Irrelevant Columns**: Columns that were exclude from 2020 were dropped as they were deemed irrelevant.
### Prepping Data for Analysis
This section involves renaming columns to ensure uniformity, ensuring data types are consistent, merging data sets and removing irrelevant columns and bad data. Additionally, creating new column for data aggregation.
```{r message=FALSE, warning=FALSE, include=FALSE}
# Renaming Columns
#Columns in q2_2019, q3_2019 & q4_2019 to be renamed to make them consistent with q1_2020
q2_2019 <- rename(q2_2019,
ride_id = "X01...Rental.Details.Rental.ID",
rideable_type = "X01...Rental.Details.Bike.ID",
started_at = "X01...Rental.Details.Local.Start.Time",
ended_at = "X01...Rental.Details.Local.End.Time",
start_station_name = "X03...Rental.Start.Station.Name",
start_station_id = "X03...Rental.Start.Station.ID",
end_station_name = "X02...Rental.End.Station.Name",
end_station_id = "X02...Rental.End.Station.ID",
member_casual = "User.Type")
q3_2019 <- rename(q3_2019,
ride_id = "trip_id",
rideable_type = "bikeid",
started_at = "start_time",
ended_at = "end_time",
start_station_name = "from_station_name",
start_station_id = "from_station_id",
end_station_name = "to_station_name",
end_station_id = "to_station_id",
member_casual = "usertype")
q4_2019 <- rename(q4_2019,
ride_id = "trip_id",
rideable_type = "bikeid",
started_at = "start_time",
ended_at = "end_time",
start_station_name = "from_station_name",
start_station_id = "from_station_id",
end_station_name = "to_station_name",
end_station_id = "to_station_id",
member_casual = "usertype")
```
```{r message=FALSE, warning=FALSE, include=FALSE}
##Converting data types in q2_2019, q3_2019 & q4_2019 to match data types in q1_2020
q2_2019 <- mutate(q2_2019, ride_id = as.character(ride_id),
rideable_type = as.character(rideable_type))
q3_2019 <- mutate(q3_2019, ride_id = as.character(ride_id),
rideable_type = as.character(rideable_type))
q4_2019 <- mutate(q4_2019, ride_id = as.character(ride_id),
rideable_type = as.character(rideable_type))
```
```{r message=FALSE, warning=FALSE, include=FALSE}
# Combining Respective Data Sets
bike_trips <- bind_rows(q1_2020, q2_2019, q3_2019, q4_2019)
```
```{r message=FALSE, warning=FALSE, include=FALSE}
# Removing Irrelevant Columns
#birthyear, gender, start_lat, start_lng, end_lat, end_lng, member.gender, X05...member.details.member.birthday.year, tripduration, X01...Rental.details.duration.in.seconds.uncapped
bike_trips <- bike_trips %>%
select(-c(birthyear, gender, start_lat, start_lng, end_lat, end_lng, Member.Gender,
"X05...Member.Details.Member.Birthday.Year", "tripduration", "X01...Rental.Details.Duration.In.Seconds.Uncapped"))
```
```{r message=FALSE, warning=FALSE, include=FALSE}
### Inspecting Combined Data Set
summary(bike_trips)
str(bike_trips)
head(bike_trips)
```
```{r message=FALSE, warning=FALSE, include=FALSE}
### Ensuring consistency in member_casual column
bike_trips <- bike_trips %>%
mutate(member_casual = recode(member_casual,
"Subscriber" = "member",
"Customer" = "casual"))
```
```{r echo=TRUE, message=FALSE, warning=FALSE}
# Creating New Columns
# New columns **(Day, Month & Year)** were created to enable data aggregation
bike_trips$date <- as.Date(bike_trips$started_at)
bike_trips$month <- format(as.Date(bike_trips$date),"%m")
bike_trips$day <- format(as.Date(bike_trips$date),"%d")
bike_trips$year <- format(as.Date(bike_trips$date),"%Y")
bike_trips$day_of_week <- format(as.Date(bike_trips$date),"%A")
#Creating new column to calculate duration of each ride
bike_trips$ride_length <- difftime(bike_trips$ended_at,bike_trips$started_at)
is.factor(bike_trips$ride_length)
bike_trips$ride_length <- as.numeric(as.character(bike_trips$ride_length))
is.numeric(bike_trips$ride_length)
```
```{r echo=TRUE, message=FALSE, warning=FALSE}
# Removing Bad Data
bike_trips_v2 <- bike_trips[!(bike_trips$start_station_name == "HQ QR" | bike_trips$ride_length<0),]
```
```{r echo=TRUE, message=FALSE, warning=FALSE}
# Arranging Days of the Week in Order
bike_trips_v2$day_of_week <- ordered(bike_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
```
# Descriptive Analysis
```{r echo=FALSE, message=FALSE, warning=FALSE}
summary(bike_trips_v2$ride_length)
```
### Comparing Members vs Casual Riders
```{r echo=FALSE, message=FALSE, warning=FALSE}
aggregate(bike_trips_v2$ride_length ~ bike_trips_v2$member_casual, FUN = mean)
aggregate(bike_trips_v2$ride_length ~ bike_trips_v2$member_casual, FUN = median)
aggregate(bike_trips_v2$ride_length ~ bike_trips_v2$member_casual, FUN = max)
aggregate(bike_trips_v2$ride_length ~ bike_trips_v2$member_casual, FUN = min)
```
### Comparing Average Ride Times by Day for Members vs Casual Rider
```{r echo=FALSE, message=FALSE, warning=FALSE}
aggregate(bike_trips_v2$ride_length ~ bike_trips_v2$member_casual + bike_trips_v2$day_of_week, FUN = mean)
```
### Analyzing Ridership Data by Type & Weekday
```{r echo=FALSE, message=FALSE, warning=FALSE}
bike_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n(), average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday)
```
# Results and Findings
### Comparing member vs casual ride counts by weekday
```{r echo=FALSE, message=FALSE, warning=FALSE}
bike_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()) %>%
ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
geom_col(position = "dodge") +
labs(title = "Ride Counts by Weekday",
x = "Weekday",
y = "Number of Rides",
fill = "Member/Casual") +
theme_minimal()
```
The analysis revealed weekly ride patterns between members and casual riders. It showed casual riders tend to make use of the service more during the weekends, while members had consistent usage throughout the week
### Average ride duration by weekday and rider type
```{r echo=FALSE, message=FALSE, warning=FALSE, paged.print=FALSE}
bike_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(average_duration = mean(ride_length)) %>%
ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
geom_col(position = "dodge") +
labs(title = "Average Ride Duration by Weekday and Rider Type",
x = "Weekday",
y = "Average Ride Duration (minutes)",
fill = "Member/Casual") +
theme_minimal()
```
The average ride duration provided an interesting insight into bike usage. From the analysis, we see that casual riders tend to ride bike longer compared to members. This suggests casual riders use the bikes to explore the city and for leisure purposes.
```{r eval=FALSE, include=FALSE}
# Box plot of ride lengths by Rider Type
ggplot(bike_trips_v2, aes(x = member_casual, y = ride_length, fill = member_casual)) +
geom_boxplot() +
labs(title = "Box Plot of Ride Lengths by Rider Type",
x = "Rider Type",
y = "Ride Length (seconds)",
fill = "Member/Casual") +
theme_minimal()
```
### Time series of Average daily Ride Count
```{r echo=FALSE, message=FALSE, warning=FALSE}
bike_trips_v2 %>%
group_by(date) %>%
summarise(average_daily_rides = n()) %>%
ggplot(aes(x = date, y = average_daily_rides)) +
geom_line() +
labs(title = "Time Series of Average Daily Ride Counts",
x = "Date",
y = "Average Daily Ride Counts") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
```
The time series analysis showed usage fluctuation throughout the year. This provides the company the opportunity to anticipate seasonal trends and plan strategic campaigns accordingly.
# Recommendations
Based on the findings from the analysis, I would propose the following recommendations:
* **Seasonal Offers**: From the results, we can see that daily rides are higher during spring and summer months compared to fall and winter months. Therefore, I would recommend the company provide offers and promotions during those periods to retain more riders.
* **Weekday Offers**: To encourage more casual members, I recommend the company offer discounted subscriptions during the weekdays. This would encourage casual riders to subscribe as they tend to use bike less during weekdays.
* **Collaborations**: The company should partner with local businesses to provide deals or discounts to subscribers. This would encourage causal rider to subscribe.
* **Targeted Campaigns**: The marketing team should create ads specifically focused on casual members, informing them of subscription benefits such as discounts, partnerships, and weekend deals as casual riders tend to use bikes more during the weekends.