Date of Completion
8-7-2020
Embargo Period
8-7-2020
Keywords
Bayesian Analysis, Imbalanced Response Data, Hurdle Model, Skewed link Binary Regression, K-prototype Clustering
Major Advisor
Dipak K. Dey
Co-Major Advisor
Emiliano Valdez
Associate Advisor
Victor Hugo Lachos Davila
Field of Study
Statistics
Degree
Doctor of Philosophy
Open Access
Campus Access
Abstract
Modeling imbalanced data sets is a common problem in regression and classification where there is a disproportionate ratio of observations in each class. Imbalanced data analysis can be found in many different areas such as mine safety operation and life insurance. The imbalanced distribution of majority (non-event) and minority (event) classes which result in misleading output is a great challenge. Though the information contained in the majority class is very important, the hazard rate or the mortality rate is estimated and analyzed relying on the samples from the minority class. The consequences of overestimating and underestimating the probability of an event will directly impact the individual's life and safety and company's financial well-being. Therefore the study of the imbalanced problem is vital. This dissertation reviews different possible ways to handle an imbalanced class problem for count and binary response variables, the techniques for making Bayesian inference, such as Markov Chain Monte Carlo methods and Exchange algorithm. In order to analyze different types of response variable with imbalanced distribution, the zero-inflated model with skewed link and a generalized type of count distribution, binary regression with skewed links and a generalized clustering algorithm are developed using MCMC techniques. Three applications on the real data sets will be shown in mine data and life insurance data separately of how those proposed methods are employed to achieve accurate Bayesian inference.
Recommended Citation
Yin, Shuang, "Bayesian Analysis for Imbalanced Datasets" (2020). Doctoral Dissertations. 2615.
https://digitalcommons.lib.uconn.edu/dissertations/2615