Date of Completion
8-24-2019
Embargo Period
8-9-2029
Keywords
Spatial clustering; Clustering; Single-cell; Hi-C
Major Advisor
Yuping Zhang
Associate Advisor
Ming-Hui Chen
Associate Advisor
Zhiyi Chi
Associate Advisor
Joseph Glaz
Field of Study
Statistics
Degree
Doctor of Philosophy
Open Access
Open Access
Abstract
Recent advance on biotechnologies such as the single-cell RNA sequencing technology and the Hi-C assays produces huge amount of unlabelled information and opens the door for many biomedical researches, such as transcriptional characterization of individual cells, comprehensive chromosomal conformation investigation, etc. In this thesis, we study the problem of using unsupervised methods such as clustering and scan spatial clustering to extract patterns and learn representations from single-cell RNA-seq and Hi-C data.
To tackle the heterogeneity of single-cell RNA-seq data, powerful and appropriate clustering is required to facilitate the discovery of cell types. In this dissertation research, we propose a graph-based clustering method, Linf-SClust, and another distribution-based approach, RDMM, to extract the cluster configurations in two different perspectives. The Linf-SClust is a novel tuning-free graph-based model which constructs the graph by l-infinity measure and the entropy equalizer similarity, and divides the graph via spectral clustering. Parameter tuning and determination of the number of clusters are guided by the Gap statistic, which makes Linf-SClust a fully automatic approach. Our other method, RDMM, is a regularized Dirichlet-Multinomial finite-mixture model which addresses the gene expression clustering problem in a compositional fashion. The advantages of Linf-SClust and RDMM are shown through simulations and real applications.
The Hi-C experiment enables assessment of the chromosomal structural information, including the detection of structural variations, especially translocations. In this dissertation research, we formulate the inter-chromosomal translocation detection as a problem of scan clustering in spatial point process. We then develop TranScan, a new translocation detection method via scan statistics with the control of false discovery. The real application of TranScan to Hi-C data in breast cancer research, successfully identifies previously discovered translocation events and also suggests a new putative segment translocated between nonhomologous chromosomes.
Recommended Citation
Mao, Disheng, "Unsupervised Pattern Recognition on Large-scale Genomics Data" (2019). Doctoral Dissertations. 2236.
https://digitalcommons.lib.uconn.edu/dissertations/2236