Bayesian analysis of compositional data

Date of Completion

January 1997






Compositional data are constrained vectors of multivariate observations whose elements are referred to as components. Such vectors often result when raw data are normalized or when data is obtained as proportions of a certain heterogeneous quantity. These conditions are fairly common in geology, economics and biology. Compositional data are subject to two restrictions; non-negativity and unit sum constraint on its components. The sample space is therefore a simplex subset of real space, whose dimension is a function of number of components.^ Usual multivariate procedures are seldom adequate and appropriate modeling techniques were slow to emerge because these data do not entertain known concepts of independence and the simplex also lacks a rich class of parametric distributions. In the past, Dirichlet distributions were involved in parametric modeling of compositional data although Dirichlet class is inherently unsuitable for describing such data. More recently Aitchison (1982) proposed a statistically feasible methodology in frequentist paradigm using logistic normal distributions.^ Aitchison's idea primarily relies on the fact that an additive logratio transformation produces data that can be modeled under assumption of normality (equivalently, compositions are logistic normal). Clearly this may not always be valid. Further, a caveat in connection with Aitchison's approach is that there is no satisfactory technique to verify logistic normality. As possible answers, marginal tests have been adopted but these tests may uncover the partial truth only.^ In an attempt to provide a general methodology that is more tolerant of data behavior, Box-Cox transformations are studied here in a Bayesian paradigm. Rayens and Srinivasan (1991) have examined this in a classical setup with certain strong assumptions on the transformed data to enable theory. Here, a general Bayesian methodology to this problem is presented and simulation based methods are adopted to weed out most appropriate choice of parameters. Dynamic modeling for correlated compositional data is also investigated under Box-Cox transformation and compared to regression models with vector autoregressive moving average errors. Finally, semiparametric Bayesian modeling under generalized Liouville distribution is presented as a viable alternative to model compositional data within the simplex. To summarize, existing methods for analysis of compositional data have been extended significantly. ^