Novel approaches in modeling spatially correlated multivariate data

Date of Completion

January 2006






Modeling spatially correlated data has gained increased attention in recent years, particularly due to the realization that accounting for spatial clustering and variation could enrich the information obtained. Investigators have also shown that neighborhood or area characteristics may be related to disease progression and health independently of individual-level characteristics. This dissertation proposes novel approaches in modeling spatially correlated data under the survival and generalized linear, particularly the Poisson regression, modeling setup, motivated by real data. ^ First, survival modeling for individuals with multiple cancers is investigated. Data were obtained from the SEER (Surveillance Epidemiology and End Results) database of the National Cancer Institute, which provides a fairly sophisticated platform for exploring novel approaches in modeling cancer survival. Semiparametric and parametric Bayesian hierarchical multiple cancer survival models that account for spatial clustering and variation are proposed. For the semiparametric setup, proportional hazards (PH) framework was followed, with the baseline hazard rate modeled using mixture of beta distributions. The parametric setting included both the proportional hazards and proportional odds structures, with baseline distributions given by Weibull and loglogistic distributions, respectively. Model comparison and diagnostics were implemented using the conditional predictive ordinate (CPO) approach. ^ Attention is then shifted from survival modeling to the Poisson regression setting. Using asthma hospitalization data in New York City from 1997 to 2000 at the census tract level, Poisson regression models that incorporate known asthma risk factors as well as the potential contribution of unobserved ecological level variables are proposed. The effects of unobserved risk factors are accounted for via the introduction of spatial frailties, which captures region-wide heterogeneity or possible clustering or even spatiotemporal trends. Results indicate that inclusion of these spatial frailties could improve model fit and enrich the conclusions that may be derived beyond that of the basic model. Model comparison and diagnostics were implemented using a modified deviance information criterion (DIC) and a proposed mean square error (MSE) criterion.^