Skip to content

Preparing predictor data

miturbide edited this page Jun 20, 2018 · 21 revisions

Introduction

This document illustrates the preparation of different predictor configurations in perfect-prog experiments.

Broadly speaking, there are two main configurations:

  1. Using atmospheric fields "as they are" for a given spatial domain
  2. Using principal components obtained from these fields. This can be either a principal component calculated upon a particular variable, and/or a combined PC considering a combination of different predictor variables.
  3. Furthermore, in addition to Principal Components or raw atmospheric fields that provide a synoptic descriptor, local information of a particular variable or set of variables can be also included as a predictor in the calibration phase (e.g. the surface temperature in the nearest grid cells around a given station).

The large number of options required for a fine-tuning of a downscaling method requires a flexible, yet easily configurable interface, enabling users to launch complex experiments for testing different predictor setups. prepareData has been designed to this aim. A few reproducible examples are presented in this vignette.

A note on the terminology used

In the climate4R bundle (see e.g. Cofiño et al. 2017), atmospheric variables are stored in the so called data grids. In order to efficiently handle multiple variables used as predictors in downscaling experiments, "stacks" of grids are used. These are known as multiGrids, and can be obtained using the constructor makeMultiGrid from a set of -dimensionally consistent- grids.

Example data

Predictors

Daily data from the NCEP reanalysis (Kalnay et al. 1996) are used as example. In particular, a domain centered on the Iberian Peninsula is considered, and three variables (mean sea-level pressure psl, specific humidity at 850mb hus850 and air temperature at 850mb ta850) will be used as predictors. These are built-in example datasets from the package transformeR of the climate4R bundle. See for instance help("NCEP_Iberia_hus850", package = "transformeR") for further details.

library(transformeR)
data("NCEP_Iberia_hus850", "NCEP_Iberia_psl", "NCEP_Iberia_ta850")
plotClimatology(climatology(NCEP_Iberia_psl), backdrop.theme = "coastline",
                main = "Mean DJF SLP (1983-2002)")
plotClimatology(climatology(NCEP_Iberia_hus850), backdrop.theme = "coastline",
                main = "Mean DJF hus850 (1983-2002)")
plotClimatology(climatology(NCEP_Iberia_ta850), backdrop.theme = "coastline",
                main = "Mean DJF ta850 (1983-2002)")

The grids are already spatially and temporally consistent, so they can be stacked in a multiGrid structure:

x <- makeMultiGrid(NCEP_Iberia_hus850, NCEP_Iberia_psl, NCEP_Iberia_ta850)

Predictands

The predictands correspond to the observations used for downscaling. These are typically meteorological station data or interpolated gridded datasets. In this example, we use a subset of stations used in a large intercomparison experiment of downscaling methods performed in the framework of the COST Action VALUE (Maraun et al. 2015). The target variable is daily precipitation.

data("VALUE_Iberia_tas")
y <- VALUE_Iberia_tas
plotClimatology(climatology(y), backdrop.theme = "countries", cex = 1.5,
                main = "Mean Winter daily precip (mm/day, 1983-2002")

Although the aim of this example is preparing only the predictors (the predictands are not manipulated), the information of the predictand is always required in order to ensure the spatio-temporal consistency of the experiment. The function handles internally non-overlapping temporal periods, ensuring that there is a perfect match between predictors and predictand prior to model calibration.

Worked examples

library(downscaleR)

Brief description of the output predictor data structure

In this section we briefly describe the main charateristics of the output object of prepareData. It is in essence a named list with different components, whose content varies depending on the predictor setup:

  • y: This is the predictand object. If required, it is a subset of the original one in order to ensure the temporal consistency with the predictors (this is achieved internally through the helper function getTemporalIntersection in transformeR).
  • x.global: Is the global predictor data matrix. In this example, it is a matrix in which rows correspond to times (days in this case) and the columns to locations, arranged via the internal helper transformeR::array3Dto2Dmat. If there is more than one variable as predictor, as in the example, the matrices for each variable are just binded by columns. In other cases, fopr instance when using PCs, this matrix corresponds to the selected combined PC (rows are times, and columns are PCs, in decreasing order of explained variance according to the user specifications).
  • x.local: in case local predictors are used, the local information is included here. This is a list of matrices, of the same longitude as the number of locations in the predictand, each one containing the local information that will be appended to the global matrix. Note that for extensibility to the definition of local newdata information for model prediction, it is structured as a nested list of the form sites --> members, although in the case of predictors, there is only one single member and this dimension in this case is not needed (it is necessary however in order to make predictions with multimember models, e.g. seasonal forecasts).
  • pca: This element stores the whole PCA (this is the output of the transformeR::prinComp function). It contains metadata required in subsequent steps.
  • In addition, several attributes are appended to the output to aid in the traceability of the predictor structure.

These elements will be next shown for case studies, building upon the example datasets descriibed in the previous section.

Case studies

Using the raw predictor variables

In this situation, the multigrid is directly passed to prepareData. Note that only x (predictors multigrid) and y (predictand, stations) are passed to the function, while the rest of arguments are left with their default value NULL (they could be omitted):

out <- prepareData(x = x,
                          y = y,
                          global.vars = NULL,
                          spatial.predictors = NULL,
                          local.predictors = NULL)
str(out)
str(attributes(out))

Using PCs as predictors

In this example, instead of using the raw fields as predictors, we will use principal components. The tuning of the principal component analysis can be undertaken by passing the different possible arguments admitted by transformeR::prinComp, which are detailed in the help of the function to the argument PCA of prepareData, in the form of a named list.

Note that here we will use the first 5 PCs of the 3 input variables contained in the multigrid, which are:

getVarNames(x)

If we use the first 5 PCs of the 3 input variables, hence:

out <- prepareData(x = x,
                          y = y,
                          spatial.predictors = list(n.eofs = c(10,5,5))
)

In this case, the output element pca contains the full output of prinComp, that will be needed in subsequent steps of the downscaling. There is also a global attribute PCA.pars containing metadata of the PCA options chosen. Note that now, the x.global element of the output contains the PC matrix of all the variables in the input grid, and thus it has $10+5+5=20$ columns corresponding to the number of PCs for each variable indicated in n.

str(out)

Introducing local predictors

Finally, it is possible to specify local predictors using the corresponding argument. It is passed as a named list with the following elements:

  • vars: the names of the variables to be used as local predictors
  • n: the number of nearest points/grid-boxes to the predcitand location to be used
  • fun: the aggregation function of the selected neighbours (if any). If NULL (the default), the nearest neighbours are not aggregated but just appended to the predictor matrix.

TIP: In order to select variables, either for local predictors or for multigrid subsetting via global.vars, the helper getVarNames from transformeR may be useful:

getVarNames(x)

Next, we will use the raw fields of air temperature at 850mb and sea-level pressure as global predictors (global.vars = c("ta@850", "psl")). In addition, specific humidity at 850mb will be used as a local predictor (vars = "hus@850"). In this example, we will consider the 4 closest points to the predictand location (n =4), and no aggregation of the neighbours is undertaken (fun = NULL, this argument could be ommited as this is the default).

out <- prepareData(x = x,
                          y = y,
                          global.vars = c("ta@850", "psl"),
                          local.predictors = list(vars = "hus@850",
                                                  n = 4,
                                                  fun = NULL)
)
str(out)

Instead of using all 4 neighbours sepparately, these can be aggregated, for instance using their average value as local predictor (fun = list(FUN = "mean")), or any other user-defined function, by adding all the additional arguments needed in the fun list (for instance, the 90th percentile value would be specified as fun = list(FUN = "quantile", prob = .9), etc).

out <- prepareData(x = x,
                          y = y,
                          global.vars = c("ta@850", "psl"),
                          local.predictors = list(vars = "hus@850",
                                                  n = 4,
                                                  fun = list(FUN = "mean"))
)
str(out)

Combining PCs with local predictors

This is probably the most typical configuration of predictors. The global information of the PCs is complemented with local information using an additional variable. For instance, in this example we retain temperature and sea-level pressure as global predictors, using their PCs, and include local information for specific humnifity at 850 mb. In this case, we indicate that we want to keep the PCs explaining no less than 95% of the total variance for each global variable (v.exp = c(0.95, 0.95); note that in the previous PCA example we indicated the number of PCs (argument n.eofs), instead of the amount of explained variance):

out <- prepareData(x = x,
                          y = y,
                          global.vars = c("ta@850", "psl"),
                          spatial.predictors = list(v.exp = c(.95, .95)),
                          local.predictors = list(vars = "hus@850",
                                                  n = 4,
                                                  fun = NULL)
)
str(out)

The x.global matrix has now 7 columns, corresponding to 3 PCs retained for sea-level pressure plus 3 for air temperature.

Another option is to use the joint/combined PCs as predictors. To do so, we have to pass to the argument which.combine the variables we want to be joined when calculating PCs.

out <- prepareData(x = x,
                          y = y,
                          spatial.predictors = list(v.exp = 0.95, which.combine = getVarNames(x)),
                          local.predictors = list(vars = "hus@850",
                                                  n = 4,
                                                  fun = NULL)
)
str(out)

References

  • Cofiño, A.S., Bedia, J., Iturbide, M., Vega, M., Herrera, S., Fernández, J., Frías, M.D., Manzanas, R., Gutiérrez, J.M., 2017. The ECOMS User Data Gateway: Towards seasonal forecast data provision and research reproducibility in the era of Climate Services. Climate Services. doi:10.1016/j.cliser.2017.07.001

  • Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L., Iredell, M., Saha, S., White, G., Woollen, J., Zhu, Y., Leetmaa, A., Reynolds, R., Chelliah, M., Ebisuzaki, W., Higgins, W., Janowiak, J., Mo, K.C., Ropelewski, C., Wang, J., Jenne, R., Joseph, D., 1996. The NCEP/NCAR 40-Year Reanalysis Project. Bulletin of the American Meteorological Society 77, 437–471. doi:10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2

  • Maraun, D., Widmann, M., Gutiérrez, J.M., Kotlarski, S., Chandler, R.E., Hertig, E., Wibig, J., Huth, R., Wilcke, R.A.I., 2015. VALUE: A framework to validate downscaling approaches for climate change studies. Earth’s Future 3, 2014EF000259. doi:10.1002/2014EF000259

Session info

print(sessionInfo(package = c("transformeR", "downscaleR")))
Clone this wiki locally