Bayesian spatial regression analysis with large data sets

Date of Completion

January 2001






Full inference for large spatial databases incorporating spatial association in a stochastic fashion is a challenging and difficult undertaking. Hierarchical Bayesian models provide an attractive framework of achieving these goals. These models are fitted using iterative Markov Chain Monte Carlo (MCMC) algorithms. For hierarchical models, the complexity of these algorithms is an increasing function of sample size. Implementing these algorithms require special expertise and often involves writing new software especially in the context of large data sets. ^ We propose novel methodologies for a wide range of spatial processes all of which involve working with massive data sets. The first problem studies the relationship between deforestation and population pressure in a tropical rainforest of Madagascar, where land use, derived from spectral signatures obtained from Advanced Very High Resolution Radiometer (AVHRR) scanner, is available at a pixel resolution of 1km x 1km while population is available for towns. Areal allocation schemes that allocates a spatially extensive variable like population from one set of spatial units to another using covariate information have been well studied in the literature. Recently, the allocation problem has also been solved by explicitly incorporating spatial association into the model using a hierarchical modeling approach. We add another layer of complexity by simultaneously implementing regression along with areal allocation for a massive data set. ^ In the second problem, we develop explicit spatial models to model species count data where, at most of the sampling sites, the species was not observed. Usual approach of modeling count data with excessive number of zeros using a Zero-Inflated Poisson model have been well studied in the literature. Our contribution here is two fold; firstly, we address issues of posterior propriety and propose methods to determine informative priors for some of the model parameters in a Bayesian context and secondly, we explicitly incorporate spatial association within our modeling framework. ^ While the first two problems deal with areal unit data, in the third and last problem we discuss a computationally feasible strategy of fitting a nonstationary spatial model to large point source data by modeling the error process as a distance-weighted average of locally stationary processes using a Bayesian formulation. To automate model fitting in the context of large data sets, we developed an algorithm which we call the “slice Gibbs sampler”. The methodology is illustrated by applying this spatial model to transactions data for single-family properties sold in the Dallas County and Dallas Independent School District over the 1995–1996 period. ^