-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathotbirders_1_ebirdapi_python_1.Rmd
executable file
·313 lines (243 loc) · 11.3 KB
/
otbirders_1_ebirdapi_python_1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
---
title: Wednesday Birders - Using the ebird API, Python, and R to analyze data for our birding group
author: Mark Isken
date: 2018-04-16
category: R, python
tags: R, python, birding, dplyr, ggplot2, pandas
summary: Using the eBird API from Python, we downloaded birding lists submitted by our Wednesday morning birding group. R was then used to create some basic plots showing the top species sighted each year. More analysis to follow.
output: html_document
---
## The Wednesday Birders
Since 2015, a small but growing group of birders has met each Wednesday
morning to [bird one of the parks in Oakland Township](https://oaklandnaturalareas.com/volunteer-calendar/birding-walks/).
We have a mix of birding experience, a shared love of nature and
dedication to stewardship of natural areas. The founder of the group is
a scientist/naturalist (PhD in biology/botany) and the [Natural Areas Manager for Oakland Township](http://www.oaklandtownship.org/boards_and_commissions/parks_and_recreation/stewardship.php). So, not only do we get to bird, we get to learn a ton
about the flora of the area. Our group is also fortunate to have a gifted
writer and photographer who blogs about our parks at the [Natural Areas Notebook](https://oaklandnaturalareas.com/).
Since the group's inception,
our bird lists have been entered into [eBird](https://ebird.org/home),
making it easy to answer those "Hey, have we ever seen a [insert random
bird species] in this park?" queries. Now that we've got a few years
of weekly data, it's time for some birding analysis. In this first post,
I'll describe how I:
* used the eBird API (2.0) with Python from a Jupyter notebook to download data from our bird lists into a pandas dataframe and then exported to csv file,
* used R to clean up the data to make sure we were just using our Wednesday Birders
lists for the analysis,
* used the R packages dplyr and ggplot2 to summarize and make plots of
species counts by year.
## Downloading our data from eBird
[eBird.org](http://ebird.org/content/ebird/) is an extremely popular online site for entering bird sightings. It was started by the [Cornell Lab of Ornithology](http://www.birds.cornell.edu) and the [National Audobon Society](https://www.audubon.org/) and has revolutionized birding
by making it easy for anyone to enter observational data into a shared
database and to then access that database through simple to use interfaces
within web browsers or mobile apps.
Not only does eBird make it easy for you to enter sightings and manage your own lists of birds seen, it has a nice set of tools for exploring the massive amount of data it collects.
* Summary graphs and tables
* Search for recent sightings in "hotspots" or by any location
* Interactive species maps
* ... and even more goodies
You can [download your own data](https://ebird.org/downloadMyData) or the [whole dataset](https://ebird.org/data/download) through
the eBird website. There is also an API that makes it
easy to programmatically download a variety of detailed and summary data.
The [eBird API 1.1](https://confluence.cornell.edu/display/CLOISAPI/eBird+API+1.1) is still available but people are urged to migrate to the
new [eBird API 2.0](https://documenter.getpostman.com/view/664302/ebird-api-20/2HTbHW).
I'm going to use Python to do the
data download. In order to use the eBird API 2.0 you need to [obtain a free API key](https://ebird.org/ebird/api/keygen).
api_key = 'put_your_api_key_here'
The Wednesday Birders cycle through four different parks each month. I just manually
grabbed the locIDs for these parks and stuffed them into a dictionary.
hotspot_ids = {'Bear Creek Nature Park':'L2776037',
'Cranberry Lake Park': 'L2776024',
'Charles Ilsley Park': 'L2905470',
'Draper Twin Lake Park': 'L1581963'}
We'll need to use a few libraries.
import pandas as pd
import requests
import time #used to put .5 second delay in API data call
Set the date range for the download for 2015-01-01 through 2018-02-28.
start_date = pd.Timestamp('20150101')
end_date = pd.Timestamp('20180228')
num_days = (end_date - start_date).days + 1
rng = pd.date_range(start_date, periods=num_days, freq='D')
Just a little bit of Python code needed to grab the data through a series
of web API calls.
# Base URL for eBird API 2.0
url_base_obs = 'https://ebird.org/ws2.0/data/obs/'
# Create a list to hold the individual dictionaries of observations
observations = []
# Loop over the locations of interest and dates of interest
for loc_id in loc_ids:
for d in rng:
time.sleep(0.5) # time delay
ymd = '{}/{}/{}'.format(d.year, d.month, d.day)
# Build the URL
url_obs = url_base_obs + loc_id + '/historic/' + ymd + \
'?rank=mrec&detail=full&cat=species&key=' + api_key
print(url_obs)
# Get the observations for one location and date
obs = requests.get(url_obs)
# Append the new observations to the master list
observations.extend(obs.json())
# Convert the list of dictionaries to a pandas dataframe
obs_df = pd.DataFrame(observations)
# Check out the structure of the dataframe
print(obs_df.info())
# Check out the first few rows
obs_df.head()
# Export the dataframe to a csv file
obs_df.to_csv("observations.csv", index=False)
## Data prep
All of the data prep and analysis is done in R. We'll need a few libraries:
```{r libs}
library(dplyr)
library(ggplot2)
library(lubridate)
```
Before diving into analysis and plots, a little data prep is needed:
* read CSV file into an R dataframe
* convert datetime fields to POSIXct
* include only the lists from our Wednesday morning walks
```{r prepobs}
# Read in the csv file
obs_raw <- read.csv("./data/observations.csv")
# Convert date field to POSIXct
obs_raw$obsDt <- as.POSIXct(obs_raw$obsDt)
# Create list of our birders who have entered >= 1 list
list_authors <- c("VanderWeide", "Isken", "Kriebel")
# Filter out lists not done on Wed by one of the list authors
obs_df <- obs_raw %>%
filter(lastName %in% list_authors & wday(obsDt) == 4)
# Check out the first few rows
head(obs_df)
saveRDS(obs_df, file = "observations.rds")
```
## Plots of Species Counts
How many birds of each species have we seen? How frequently are each species
seen?
Let's start with simple bar charts:
* one bar per species, one year per graph,
* bar length is number of birds seen,
* color of bar is related to percentage of lists on which that species seen,
* number at end of bar is percentage of lists on which that species seen.
![2015](images/top30_2015.png)
![2016](images/top30_2016.png)
![2017](images/top30_2017.png)
A few observations:
* Familiar year round friends such as Canada Goose, American Robin, Black-capped Chicadee, Blue Jay and American Goldfinch are sighted in large numbers and on most outings.
* Large flocks of European Starlings lead to them having a high number of sightings but appearing relatively infrequently in our lists. In 2017, one big flock of Ring-necked Ducks gave them the title of most birds seen that year!
* The overall composition of the lists are pretty similar across the three
years. However, overall numbers appear to be down in 2017. Turns out this is
in spite of fact that we had more outings (lists) in 2017 (48) than in 2016 (39).
This requires more investigation.
## Creating the plots
```{r create_topobs, echo=FALSE}
# Create num species by list df
numsp_bylist <- obs_df %>%
group_by(year=year(obsDt), obsDt, subId, lastName) %>%
count() %>%
arrange(year, subId)
# Using numsp_bylist, create num lists by date
numlists_bydt <- numsp_bylist %>%
group_by(obsDt) %>%
summarise(
numlists = n()
) %>%
filter(numlists >= 1) %>%
arrange(obsDt)
# Usings numlists_bydt, create num lists by year
numlists_byyear <- numlists_bydt %>%
group_by(birding_year=year(obsDt)) %>%
summarise(
totlists = sum(numlists)
)
# Now ready to compute species by year
species_byyear <- obs_df %>%
group_by(comName, birding_year=year(obsDt)) %>%
summarize(
num_lists = n(),
tot_birds = sum(howMany)
) %>%
arrange(birding_year, desc(tot_birds))
# Join to numlists_byyear so we can compute pct of lists each species
# appeared in.
species_byyear <- left_join(species_byyear, numlists_byyear, by = 'birding_year')
bird_year <- 2017
top_obs_byyear <- species_byyear %>%
filter(birding_year == bird_year) %>%
mutate(pctlists = num_lists / totlists) %>%
arrange(desc(tot_birds)) %>%
head(30)
```
The plots above are easy to create from a dataframe that looks like this:
```{r top_obs_byyear, echo=FALSE}
as.data.frame(top_obs_byyear)
```
The only real trickiness is getting the percentage of lists column computed.
We can do it in a few steps using dplyr. For the example below I've just
hard coded in 2017 as the year of interest. In reality, I embedded the code
below in a function and passed the year of interest in.
```{r create_topobs_2017, echo=TRUE}
# Create num species by list dataframe
numsp_bylist <- obs_df %>%
group_by(year=year(obsDt), obsDt, subId, lastName) %>%
count() %>%
arrange(year, subId)
# Using numsp_bylist, create num lists by date
numlists_bydt <- numsp_bylist %>%
group_by(obsDt) %>%
summarise(
numlists = n()
) %>%
filter(numlists >= 1) %>%
arrange(obsDt)
# Usings numlists_bydt, create num lists by year
numlists_byyear <- numlists_bydt %>%
group_by(birding_year=year(obsDt)) %>%
summarise(
totlists = sum(numlists)
)
# Now ready to compute species by year
species_byyear <- obs_df %>%
group_by(comName, birding_year=year(obsDt)) %>%
summarize(
num_lists = n(),
tot_birds = sum(howMany)
) %>%
arrange(birding_year, desc(tot_birds))
# Join to numlists_byyear so we can compute pct of lists each species
# appeared in.
species_byyear <- left_join(species_byyear, numlists_byyear, by = 'birding_year')
# These would be passed in to function version of this code
bird_year <- 2017
ntop <- 30
# Compute the percentage of lists on which each species appeard
top_obs_byyear <- species_byyear %>%
filter(birding_year == bird_year) %>%
mutate(pctlists = num_lists / totlists) %>%
arrange(desc(tot_birds)) %>%
head(ntop)
```
Finally we are ready to make the plot. For this post I'm cheating a bit by
hard coding in a y-axis limit. In the function version, this can be passed in.
```{r plot2017}
ggplot(top_obs_byyear) +
geom_bar(aes(x=reorder(comName, tot_birds),
y=tot_birds, fill=pctlists), stat = "identity") +
scale_fill_gradient(low='#05D9F6', high='#5011D1') +
labs(x="",
y="Total number of birds sighted",
fill="Pct of lists",
title = paste0("Top 30 most sighted birds by Wednesday Birders in ", bird_year),
subtitle = "(number at right of bar is % of lists on which the species appeared)") +
coord_flip() +
geom_text(data=top_obs_byyear,
aes(x=reorder(comName,tot_birds),
y=tot_birds,
label=format(pctlists, digits = 1),
hjust=0
), size=3) + ylim(0,550)
```
## Next steps
Now that we've got the raw data downloaded and cleaned up, we can do a bunch
of exploratory analysis and our Wednesday morning birding group will know a
little more about what we've been seeing.