Date of Completion

8-15-2017

Embargo Period

8-15-2017

Keywords

multi-view data, integrative learning, canonical variate, mixed-response

Major Advisor

Kun Chen

Co-Major Advisor

Dipak K. Dey

Associate Advisor

Haim Bar

Associate Advisor

Jun Yan

Field of Study

Statistics

Degree

Doctor of Philosophy

Open Access

Campus Access

Abstract

The emerging of multi-view data, or multiple datasets collected from different sources measuring distinct but interrelated sets of characteristics on the same set of subjects, brings much complexity to the data analyses. Due to the view-specific characteristics and the interrelationship of multi-view data, integrative statistical methodologies are demanded. The reduced-rank structure is useful for extracting the complex dependence structure, as it achieves dimension reduction in coefficient matrix estimation and admits an appealing latent factor interpretation. We propose two approaches for integrative multivariate regression analyses incorporating certain reduced-rank structure, motivated by two kinds of multi-view data. We first consider the data with multi-view covariates, together with certain phenotype/outcome variables. Essential task is how to integratively extract the possibly low dimensional association structure among the sets of covariates when utilizing it to build a good predictive model. The proposed canonical variate regression (CVR) bridges the gap between canonical correlation analysis (CCA) and reduced-rank regression (RRR) by examining the interrelationship between multiple sets of features under the supervision from the responses. The non-convex optimization problem is solved by an alternating direction method of multipliers (ADMM) based algorithm. Simulation and two genetic study examples are presented. We also consider the data with multi-view responses, in which the mixed-type response variables are interrelated but have different distributions with missing values. The proposed mixed-response reduced-rank regression (mRRR) characterizes the joint dependence structure of responses by assuming a low-rank structure of the coefficient matrix. An efficient computation algorithms is developed and guaranteed to converge. The non-asymptotic bound of nature parameter estimation with rank constraint is also explored. Numerical examples including simulation and a longitudinal study of aging (LSOA) are presented. Limitations of proposed methods and directions of future work are summarized in the discussion chapter.

Share

COinS