Skip to content

Predict methods for spatial econometrics

Martin G. edited this page Mar 26, 2015 · 12 revisions

Implementing new predict methods for spatial econometrics packages

Summary: Extending the predict methods of spdep, splm and sphet packages with new predictors from the research literature. With the provision of predict methods, the usefulness of these estimation methods will increase considerably, in applied fields such as real estate, public economics, environmental impacts, epidemiology, criminology, industrial location, etc.

Description: Today, there is only one predict method implemented in spdep for the LAG and Durbin models (see predict.sarlm and R. Bivand (2002), Journal of Geographical Systems). No predict functions are included in splm and sphet. The goal of this proposal is to develop predict methods according to recent developments in the spatial econometrics literature.

In- and out-of-sample predictors of spatial econometrics models are different from the classical OLS predictor by the nature of these models: including lagged variable(s) makes the computation of the conditional expected value of Y more complicated. There are recent developments in the literature about new in- and out-of-sample predictors, for example on the LAG model, see C.Thomas et al. (2014).

Unlike spatial statistics, spatial econometrics has not developed well-tried prediction methods, despite their potential usefulness, for example as pointed out by Thomas et al. (2014) in making estimates during incomplete census campaigns. The problem of prediction is linked to the global and/or local simultaneous dependency (spillover) between cross-sectional observations, and to the effects/impacts of covariates - which are not equal to the regression coefficients.

The completion of this project will be signalled by the successful implementation and deployment of predict methods for cross-sectional models in spdep and sphet, and the prototyping of predict methods for non-dynamic spatial panel models in splm. This is feasible in three months; should these goals be met ahead of schedule, the next targets include prototyping methods for dynamic spatial panels and/or prediction standard errors and/or epredict methods related to the estimability package.

Context: R is actually the most advanced platform for estimating spatial econometrics models. Spatial econometrics models are very promising due to the growing amount of geo-data. In this context, a lot of statisticians are estimating models with the hypothesis of independence between observations. This hypothesis in many applications is not found to be valid, and leads to biased and inefficient estimates, because of dependence between spatial observations. We think improving predictions will be very useful in the application domains of spatial econometrics: house prices, hedonic models, epidemiology, criminology, industrial economy, etc.

Related work: There is a project in early alpha version initiated by JS Ay and al. on GitHub. This project includes only BLUP functions for spdep (for the LAG model, see C.Thomas and al. (2014) to see that there many more predictors than these). There is also some functions missing, like predictors for splm and sphet. The student can fork this code, but the goal of this GSoC is to go substantially beyond what was implemented over ten years ago, to verify, extend and unify the methods involved.

Potential tasks:

  • make a review of the literature on in-/out-of-sample predictors for spatial econometrics models (in cross-section, panel, and with heteroskedastic innovations) to assess whether the legacy implementation can be revised or should be replaced
  • examine how to automate access to legacy packages on R-Forge from git so that testing may use git if desired (fallback is using SVN on R-Forge)
  • think about a common interface for predict methods across the three packages: some modifications in the output of fitting models functions may will necessary
  • implement predictors methods and test them (writing unit tests functions will be appreciated)
  • possibly consider the relevance of enhancements in the estimability package
  • smoothly integrate new content into the three legacy packages on R-Forge (as patches for existing files in /R and /man and NAMESPACE, and as new files)

Skills required:

  • good coding experience with R,
  • some knowledge of the spatial econometrics and statistical estimation literature
  • be able to implement some functions in C to compare performances of some functions between R and C,
  • at least basic git and subversion knowledge.

Tests:

  • The estimation of the SARAR model by maximum likelihood is actually broken, try to find out why
  • For the sphet package, try to see what is needed to be in the object returned (and then what needs to be change)

Mentor: Roger Bivand [@](mailto:roger {dot} bivand {at} nhh {dot} no) and Giovanni Millo [@](mailto:giovanni {dot} millo {at} generali {dot} com) as backup mentor.

References:

  • Bivand RS (2002). "Spatial Econometrics Functions in R: Classes and Methods." Journal of Geographical Systems, 4, 405-421.
  • Bivand RS, Piras G (2015) "Comparing Implementations of Estimation Methods for Spatial Econometrics." Journal of Statistical Software, 63(18), 1-36
  • Millo G, Piras G (2012). "splm: Spatial Panel Data Models in R." Journal of Statistical Software, 47(1), 1-38.
  • Millo G (2014). "Maximum likelihood estimation of spatially and serially correlated panels with random effects". Computational Statistics & Data Analysis 71 (March), 914-933.
  • Piras G (2010). "sphet: Spatial Models with Heteroskedastic Innovations in R." Journal of Statistical Software, 35(1), 1-21.
  • Thomas-Agnan C, Laurent T, Goulard M (2014). "About predictions in spatial autoregressive models: Optimal and almost optimal strategies", TSE Working Paper, n. 13-452, December 18, 2013, revised September 2014.
Clone this wiki locally