Date of Completion

1-10-2019

Embargo Period

1-10-2019

Keywords

Metric Nets, Net-Trees, Randomized Incremental Construction, Hierarchical Structures, Topological Data Analysis, Computational Geometry

Major Advisor

Donald R. Sheehy

Associate Advisor

Thomas J. Peters

Associate Advisor

Sanguthevar Rajasekaran

Field of Study

Computer Science and Engineering

Degree

Doctor of Philosophy

Open Access

Abstract

The volume of data is not the only problem in modern data analysis, data complexity is often more challenging. In many areas such as computational biology, topological data analysis, and machine learning, the data resides in high dimensional spaces which may not even be Euclidean. Therefore, processing such massive and complex data and extracting some useful information is a big challenge. Our methods will apply to any data sets given as a set of objects and a metric that measures the distance between them.

In this dissertation, we first consider the problem of preprocessing and organizing such complex data into a hierarchical data structure that allows efficient nearest neighbor and range queries. There have been many data structures for general metric spaces, but almost all of them have construction time that can be quadratic in terms of the number of points. There are only two data structures with O(n log n) construction time, but both have very complex algorithms and analyses. Also, they cannot be implemented efficiently. Here, we present a simple, randomized incremental algorithm that builds a metric data structure in O(n log n) time in expectation. Thus, we achieve the best of both worlds, simple implementation with asymptotically optimal performance.

Furthermore, we consider the close relationship between our metric data structure and point orderings used in applications such as k-center clustering. We give linear time algorithms to go back and forth between these orderings and our metric data structure.

In the last part, we use metric data structures to extract topological features of a data set, such as the number of connected components, holes, and voids. We give an efficient algorithm for constructing a (1 + epsilon)-approximation to the so-called Nerve filtration of a metric space, a fundamental tool in topological data analysis.

Recommended Citation

Jahanseirroodsari, Mahmoodreza, "Hierarchical Structures for High Dimensional Data Analysis" (2019). Doctoral Dissertations. 2028.
https://digitalcommons.lib.uconn.edu/dissertations/2028

Download

COinS

Doctoral Dissertations

Hierarchical Structures for High Dimensional Data Analysis

Date of Completion

Embargo Period

Keywords

Major Advisor

Associate Advisor

Associate Advisor

Field of Study

Degree

Open Access

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

Homepage

Doctoral Dissertations

Hierarchical Structures for High Dimensional Data Analysis

Authors

Date of Completion

Embargo Period

Keywords

Major Advisor

Associate Advisor

Associate Advisor

Field of Study

Degree

Open Access

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner

Homepage