Date of Completion
1-10-2019
Embargo Period
1-10-2019
Keywords
Metric Nets, Net-Trees, Randomized Incremental Construction, Hierarchical Structures, Topological Data Analysis, Computational Geometry
Major Advisor
Donald R. Sheehy
Associate Advisor
Thomas J. Peters
Associate Advisor
Sanguthevar Rajasekaran
Field of Study
Computer Science and Engineering
Degree
Doctor of Philosophy
Open Access
Open Access
Abstract
The volume of data is not the only problem in modern data analysis, data complexity is often more challenging. In many areas such as computational biology, topological data analysis, and machine learning, the data resides in high dimensional spaces which may not even be Euclidean. Therefore, processing such massive and complex data and extracting some useful information is a big challenge. Our methods will apply to any data sets given as a set of objects and a metric that measures the distance between them.
In this dissertation, we first consider the problem of preprocessing and organizing such complex data into a hierarchical data structure that allows efficient nearest neighbor and range queries. There have been many data structures for general metric spaces, but almost all of them have construction time that can be quadratic in terms of the number of points. There are only two data structures with O(n log n) construction time, but both have very complex algorithms and analyses. Also, they cannot be implemented efficiently. Here, we present a simple, randomized incremental algorithm that builds a metric data structure in O(n log n) time in expectation. Thus, we achieve the best of both worlds, simple implementation with asymptotically optimal performance.
Furthermore, we consider the close relationship between our metric data structure and point orderings used in applications such as k-center clustering. We give linear time algorithms to go back and forth between these orderings and our metric data structure.
In the last part, we use metric data structures to extract topological features of a data set, such as the number of connected components, holes, and voids. We give an efficient algorithm for constructing a (1 + epsilon)-approximation to the so-called Nerve filtration of a metric space, a fundamental tool in topological data analysis.
Recommended Citation
Jahanseirroodsari, Mahmoodreza, "Hierarchical Structures for High Dimensional Data Analysis" (2019). Doctoral Dissertations. 2028.
https://digitalcommons.lib.uconn.edu/dissertations/2028