Date of Completion


Embargo Period



Missing data, self-report, bio-marker, multiple imputation

Major Advisor

Ofer Harel

Associate Advisor

Haiying Wang

Associate Advisor

Victor Hugo Lachos Davila

Field of Study



Doctor of Philosophy

Open Access

Open Access


Joint analysis of self-report and biomarker measurements provides new opportunities to understand and characterize human behaviors. Self-report measures are the most common way to assess human behavior, because they are quick, straightforward, and inexpensive. But they are easily limited by factors such as recall bias toward under-report. Thus, a wide variety of biological measurements have been developed to objectively assess human behaviors. However, the accuracy of biological measurement can also vary between studies, not just through chance, but also with changes in the study setting, the spectrum of disease, and definition of the target condition. Henceforth, self-report measures and biological marker together are likely to provide the basis for a more accurate estimate of participants' behavior than either does alone. This is the reason why simultaneous analysis of self-report measures and biomarker is appealing. There are two major research issues with such joint analysis. First, when researchers intend to combine biological marker and self-report measures as explanatory variables in the longitudinal analysis, the problem of multicollinearity arises. Second, in longitudinal studies, variables which are recorded over the course of study are easily subject to missing observations. The data motivating our research arise from a longitudinal cohort study of an HIV clinic in southwestern Uganda who were not yet eligible for antiretroviral therapy (ART). Beginning in 2011, 447 patients were recruited with follow-up visits every 6 months for up to 3 years. The objective of the study is to examine the relationship between alcohol use and HIV disease progression measured by CD4 cell count among ART naive HIV infected Ugandans. Self-report measures on the Alcohol Use Disorders Identification Test-Consumption (AUDIT-C), and biological markers-phosphatidylethanol (PEth), are both used to measure alcohol use. To address the correlation between AUDIT-C score and PEth, we propose Bayesian shrinkage prior in the setting of linear mixed model. In light of missing observations in response and time-dependent covariates, we propose a two-stage multiple imputation for the missing response and missing time-varying covariates in longitudinal data. Last, we extend the two-stage multiple imputation approach by introducing Bayesian shrinkage prior into the imputation process to account for partly-observed response and partly observed correlated time-dependent covariates simultaneously. We carry out a detailed analysis of the data using the proposed approaches. Simulation studies are conducted to compare the proposed approaches to existing approaches.