-
Notifications
You must be signed in to change notification settings - Fork 14
Home
Spacv is a spatial machine learning library for the cross-validation of techniques that assess generalization performance to datasets with structural dependence. The library provides a one-stop shop for all spatial cross-validation and predictive needs, including consistent updates in line with cutting-edge innovations in the field. Ultimately, Spacv offloads technical considerations required to implement convincing spatial predictive methods from the user, releasing greater creative freedoms to researchers interested in problem cases that necessitate attention to the dimension of space.
The intended uses of Spacv are for spatial prediction and the training of machine learning models on spatial data so that they take into account dependence structures in training and validation data. Users of this library will come from diverse scientific backgrounds, including ecology, spatial epidemiology, population geography and so, but be unified in an interest to research that shares a spatial dimension.
Datasets with correlation structures are commonplace in spatial statistical applications. Values of nearby observations are more similar than distant observations, and this underlying dependence structure is problematic for model validation, selection and predictive error. Cross-validation procedures typically split the initial dataset into two subsets, a training set for estimating model parameters and a validation set for testing out-of-sample generalisability.
A critical prerequisite under the model being evaluated requires i.i.d data, and in the spatial context this assumption rarely holds. When data held-out for validation is drawn from nearby a training point, overly optimistic estimates of predictive error can be yielded owing to the independence of evaluation data being compromised from spatial autocorrelation. Spacv provides a one-stop shop of cross-validation approaches for allowing practitioners to achieve unbiased error and parameter estimates among their modelling exercises. The library provides an array of tools users can pick between for their tailored problem case, alongside several relevant (and recent) utilities in spatial prediction. Examples include:
-
h-blocking
-
Spatial Leave One Out (SLOO)
-
Rabinowicz' bias-corrected CV measure
-
Area of applicability for spatial prediction
Spacv provides a sklearn-like interface for cross-validation exercises with correlated data, with the library's design aspiring for simplicity in implementation. Spacv is intended to offload the complexity of spatial prediction design decisions from the user, allowing them greater creative flexibility to hypothesise and answer interesting research questions rather than being impeded by technical considerations. Ultimately, Spacv hopes to provide usable functions that allow spatial analysts to experiment with cutting-edge spatial prediction methods without users having to program tools themselves.
For completeness, and in respect to existing software, we mention several existing libraries that include elements of spatial cross-validation. These include:
Spacv differentiates from existing software in a number of ways. Most fundamentally, it is the only Python library solely dedicated to spatial cross-validation and predictive methods.
-
Airola, A., Pohjankukka, J., Torppa, J. et al. The spatial leave-pair-out cross-validation method for reliable AUC estimation of spatial classifiers. Data Min Knowl Disc 33, 730–747 (2019). https://doi.org/10.1007/s10618-018-00607-x
-
Hijmans, R.J. (2012), Cross‐validation of species distribution models: removing spatial sorting bias and calibration with a null model. Ecology, 93: 679-688. doi:10.1890/11-0826.1
-
Meyer, Hanna & Reudenbach, Christoph & Hengl, Tomislav & Katurji, Marwan & Nauss, Thomas. (2018). Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. Environmental Modelling & Software. 101. 1 - 9. 10.1016/j.envsoft.2017.12.001.
-
Meyer, H. and Pebesma (2020) Predicting into unknown space? Estimating the area of applicability of spatial prediction models. https://arxiv.org/abs/2005.07939
-
Pohjankukka, J., Pahikkala, T., Nevalainen P. & Heikkonen, J. (2017) Estimating the prediction performance of spatial models via spatial k-fold cross validation, International Journal of Geographical Information Science, 31:10, 2001-2019, DOI: 10.1080/13658816.2017.1346255
-
Rabinowicz, Assaf & Rosset, Saharon. (2019). Cross-Validation for Correlated Data. https://arxiv.org/abs/1904.02438
-
Rest, K., Pinaud, D., Monestiez, P., Chadoeuf, J. and Bretagnolle, V. (2014), Spatial leave‐one‐out cross‐validation. Global Ecology and Biogeography, 23: 811-820. doi:10.1111/geb.12161
-
Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera‐Arroita, G., Hauenstein, S., Lahoz‐Monfort, J.J., Schröder, B., Thuiller, W., Warton, D.I., Wintle, B.A., Hartig, F. and Dormann, C.F. (2017), Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40: 913-929. doi:10.1111/ecog.02881
-
Schratz, P., Muenchow, J., Iturritxa, E., Richter, J., Brenning, A. (2019). Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data. Ecological Modelling. 406. 109-120. 10.1016/j.ecolmodel.2019.06.002.
-
Valavi, R, Elith, J, Lahoz‐Monfort, JJ, Guillera‐Arroita, G. block CV: An r package for generating spatially or environmentally separated folds for k ‐fold cross‐validation of species distribution models. Methods Ecol Evol. 2019; 10: 225– 232. https://doi.org/10.1111/2041-210X.13107