-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathREADME.Rmd
156 lines (116 loc) · 11.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
title: "Climate Futures Toolbox"
output:
github_document:
html_preview: True
---
# Welcome to the Climate Futures Toolbox
This is a package developed as a collaboration between Earth lab and the North Central Climate Adaptation Science Center to help users gain insights from available climate data. This package
includes tools and instructions for downloading climate data via a USGS API and then organizing those
data for visualization and analysis that drive insight.
This package is currently growing to include better functionality for spatial analyses and more user-friendly features. Thank you for all the wonderful beta tester groups that helped us get the software this far. Please be patient as we update some of the functions and vignette to accommodate more functionality.
# What you'll find here
This vignette provides a walk-through of a common use case of the cft package, which is, to
help users download, organize, and visualize past and future climate data.
1. How to download and install the cft package
2. How to see the menu of available data and choose items from that menu
3. A description of both functions available in the cft package and their primary usage cases
## Why write the cft package?
The amount of data generated by downscaled GCMs can be quite large
(e.g., daily data at a few km spatial resolution).
The Climate Futures Toolbox was developed to help users access and use
smaller subsets.
Data is acquired from the [Northwest Knowledge Server of the University of
Idaho](http://thredds.northwestknowledge.net:8080/thredds/reacch_climate_CMIP5_macav2_catalog2.html).
### What you'll need
To get the most out of this vignette, we assume you have:
- At least 500 MB of disk space
- Some familiarity with ggplot2
- Some familiarity with dplyr (e.g., [`filter()`](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/filter), [`group_by()`](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/group_by), and [`summarise()`](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/summarise))
## About the data
Global Circulation Models (GCMs) provide estimates of historical and future
climate conditions.
The complexity of the climate system has lead to a large number GCMs and it is
common practice to examine outputs from many different models, treating each as
one plausible future.
Most GCMs are spatially coarse (often 1 degree), but downscaling provides finer
scale estimates. The cft package uses one downscaled climate model called MACA
(Multivariate Adaptive Climate Analog) Version 2
([details here](http://www.climatologylab.org/maca.html)).
# Loading the cft package from github
```{r install cft, warning=FALSE, message=FALSE, eval=FALSE}
library(devtools)
install_github("earthlab/cft", force = TRUE)
```
## Attach cft and check the list of available functions
```{r}
library(multidplyr)
library(cft)
ls(pos="package:cft")
```
## Look at the documentation for those functions
```{r manual}
?available_data
```
```{r}
?single_point_firehose
```
# Use read-only mode to find available data without initiating a full download.
```{r available data, cache=TRUE}
inputs <- cft::available_data()
```
# Donwloading Data
There are too many data available to download in a single download request. You will need limit your requests to 500 MB. This is enough to download a single variable for a single spatial point for the full available time period, but not more than that. This means that we must filter the results from available_data() to specify the data that we want to actually download. If you want to pull multiple variables and/or multiple spatial locations, you will need to submit multiple request for data and merge those tables together after download. We provide examples for both.
Notice that if you want to download multiple variables in their entirety for several specific lat/long locations, single_point_firehose() will provide better functionality than available_data(). The single_point_firehose() function parallelizes the download requests, which allows it to download a large quantity of data much faster than the available_data() function. Additionally, the single_point_firehose() function combines the data from the parallelized download requests into a single sf spatial dataframe. A vignette walking through a common use of the single_point_firehose() function can be found at https://github.com/earthlab/cft/blob/main/vignettes/firehose.md
## We can look at just the unique variable types to get an idea of what's available
```{r}
levels(as.factor(inputs$variable_names$Variable))
levels(as.factor(inputs$variable_names$`Variable abbreviation`))
levels(as.factor(inputs$variable_names$Scenario))
levels(as.factor(inputs$variable_names$`Scenario abbreviation`))
levels(as.factor(inputs$variable_names$Model))
levels(as.factor(inputs$variable_names$`Model abbreviation`))
```
## But we prefer to use the table version so we can easily select combinations of variables that we want to pull together.
Here you will use tidy notation to filter the available_data table to only include the entries that you would like to download.
## Filter variable names
This filter includes all of the climate models, all of the scenarios, and 5 variables. It is a big request.
```{r filter variables many, cache=TRUE}
input_variables <- inputs$variable_names %>%
filter(Variable %in% c("Maximum Relative Humidity",
"Maximum Temperature",
"Minimum Relative Humidity",
"Minimum Temperature",
"Precipitation")) %>%
filter(Scenario %in% c( "RCP 4.5", "RCP 8.5")) %>%
filter(Model %in% c(
"Beijing Climate Center - Climate System Model 1.1",
"Beijing Normal University - Earth System Model",
"Canadian Earth System Model 2",
"Centre National de Recherches Météorologiques - Climate Model 5",
"Commonwealth Scientific and Industrial Research Organisation - Mk3.6.0",
"Community Climate System Model 4",
"Geophysical Fluid Dynamics Laboratory - Earth System Model 2 Generalized Ocean Layer Dynamics",
"Geophysical Fluid Dynamics Laboratory - Earth System Model 2 Modular Ocean",
"Hadley Global Environment Model 2 - Climate Chemistry 365 (day) ",
"Hadley Global Environment Model 2 - Earth System 365 (day)",
"Institut Pierre Simon Laplace (IPSL) - Climate Model 5A - Low Resolution",
"Institut Pierre Simon Laplace (IPSL) - Climate Model 5A - Medium Resolution",
"Institut Pierre Simon Laplace (IPSL) - Climate Model 5B - Low Resolution",
"Institute of Numerical Mathematics Climate Model 4",
"Meteorological Research Institute - Coupled Global Climate Model 3",
"Model for Interdisciplinary Research On Climate - Earth System Model",
"Model for Interdisciplinary Research On Climate - Earth System Model - Chemistry",
"Model for Interdisciplinary Research On Climate 5",
"Norwegian Earth System Model 1 - Medium Resolution" )) %>%
pull("Available variable")
input_variables
```
# Climate Futures Toolbox Functions
As previously mentioned, the climate futures toolbox includes two functions: available_data() and single_point_firehose(). While both of these functions are used to download data from the MACA climate model, they have slightly different usage cases which will be described below.
## Available Data Function
The available_data() function can be used to download data from the MACA climate model. These downloads can include data about multiple variables for multiple emission scenarios and from multiple climate models over a desired length of time and over a desired spatial region. However, there is a 500 MB limit on the size of the download request when using available_data which limits the amount of data that can be requested in a download and successfully downloaded. It is possible to submit a download request that exceeds the 500 MB limit, but the download will not finish and an error message will be produced, as mentioned above. Notice that there is no singular cut-off on the number of variables, emissions scenarios, and climate models can be requested for a given time period and a given spatial region. If you need to download more than 500 MB of data, you can either make multiple download requests by calling available_data multiple times and stitching the data obtained from those download requests together or you can use the single_point_firehose function which parallelizes download requests and combines the data into a single spatial dataframe. An example of how to make multiple download requests using available_data is shown in the available_data vignette at: https://github.com/earthlab/cft/blob/main/vignettes/available-data.md. As is mentioned in the available_data vignette, it is computationally expensive to stitch the data from multiple download requests using availabe_data together, so it is easier and more computationally efficient to use the single_point_firehose function if you encounter errors when trying to download data using the available_data function.
**Overall, the available_data function works best for downloading MACA climate model data for several climate variables from several climate models for any number of emission scenarios over a relatively small spatial region and over a relatively short time period.**
## Firehose Function
The single_point_firehose() function is also used to download data from the MACA climate model. Unlike the available_data function, however, the single_point_firehose function parallelizes multiple download requests across the cores of your computer. This allows for faster downloading of data from the MACA climate model and permits users to request data about multiple climate variables from multiple models and emission scenarios in their entirety because the download requests are broken up such that no download request exceeds the 500 MB limit. After the data are downloaded from the MACA climate model from these parallelized download requests, the single_point_firehose function combines the data into a single spatial dataframe. It is important to note that the single_point_firehose function obtains data from the MACA climate model for multiple climate variables from multiple models and emission scenarios in their entirety **for a single lat/long location**. If you need MACA climate model data for multiple climate variables from multiple models and emission scenarios in their entirety at multiple lat/long locations, you will need to make multiple calls to the single_point_firehose function and combine the data from those requests. An example of how to use the single_point_firehose function to download MACA climate model data is shown in the firehose vignette at https://github.com/earthlab/cft/blob/main/vignettes/firehose.md.
**Therefore, the single_point_firehose function works best for downloading MACA climate model data for multiple climate variables from multiple climate models and emission scenarios in their entirety for a single lat/long location.**