Date of Completion


Embargo Period



Depression Prediction, Smartphone Sensing, Machine Learning, Data Analytics, Internet Traffic Characteristics, Global Positioning System, Wireless fidelity, Data integration, Feature extraction, Fuses

Major Advisor

Bing Wang

Associate Advisor

Alexander Russell

Associate Advisor

Jinbo Bi

Field of Study

Computer Science and Engineering


Doctor of Philosophy

Open Access

Campus Access


Depression is a serious and widespread mental illness that affects 350 million people worldwide. Current diagnosis has been based on clinical interviews or patient self-reports. Both are limited by recall bias. In addition, clinical interviews require direct attention of a skilled clinician, which is problematic due to the lack of trained professionals. Furthermore, the interviews typically take place in clinics or treatment centers, leading to limited ecological validity. Patient self-reports require that patients fill in the questionnaires consistently over time to monitor depression conditions, which is burdensome and hence difficult to execute on a continuous basis. In this dissertation, we explore using sensing data passively collected from mobile devices (smartphones and wearable devices) for automatic depression screening and sleep quality prediction.

In the first part of the dissertation, we develop a novel approach that addresses missing location data collected on smartphones to obtain more complete location data and more effective depression screening. While location information can be conveniently gathered by GPS, typical datasets suffer from significant periods of missing data due to various factors (e.g., phone power dynamics, limitations of GPS). A common approach is to remove the time periods with significant missing data before data analysis. In our work, we explore another source of location data---WiFi association records---which indicate when a smartphone is associated with a wireless access point (AP), that is complementary to GPS data. Specifically, WiFi coverage is better inside buildings; and collecting WiFi association records is much less energy consuming than using GPS. We develop an approach that fuses location data collected from these two sources on smartphones, and evaluate its performance using a dataset collected from 79 college students. Our evaluation demonstrates that the above approach leads to significantly more complete data; the features extracted from the more complete data present stronger correlation with self-report depression scores, and lead to depression prediction with much higher F1 scores (up to 0.76 compared to 0.5 before data fusion).

In the second part of the dissertation, we explore a novel type of sensing data, coarse-grained meta-data of Internet traffic of smartphones, for depression screening. We develop techniques to identify Internet usage sessions (i.e., time periods when a user is online) and extract a novel set of features from the Internet traffic meta-data. Our results demonstrate that Internet usage features can reflect the different behavioral characteristics between depressed and non-depressed participants, confirming findings in psychological sciences, which have relied on self-reports instead of real Internet traffic as in our study. Furthermore, we develop machine learning based prediction models that use these features to predict depression. Our evaluation shows that Internet usage features can be used for effective depression prediction, leading to F1 score as high as 0.71.

In the third part of the dissertation, we use smartphone sensing data (e.g., location, activity) to predict one's sleep quality, which is highly related to both mental and physical health. Sleep efficiency (i.e., the amount of time asleep over the total amount of time in bed) is one of the most important metrics for measuring sleep quality. While wearable devices can be used to measure sleep efficiency, one must wear such devices during sleep. We investigate an alternative approach that utilizes location data collected from smartphones to predict sleep quality. Specifically, we extract location-based features from smartphone data and develop machine learning-based prediction models to predict sleep efficiency. Furthermore, we explore the impact of the amount of historical data and sequential data on prediction accuracy. Our evaluation shows that location-based features can be used for effective sleep efficiency prediction, leading to F1 score as high as 0.76.