Date of Completion
3-8-2020
Embargo Period
2-17-2020
Keywords
Closest Pair; Approximate Algorithm; Deterministic Algorithm; Dynamic Time Warping; Neural Network
Major Advisor
Sanguthevar Rajasekaran
Associate Advisor
Yufeng Wu
Associate Advisor
Song Han
Field of Study
Computer Science and Engineering
Degree
Doctor of Philosophy
Open Access
Open Access
Abstract
The Closest Pair problem aims to identify the closest pair (using some similarity measure, e.g., Euclidean distance, Dynamic Time Warping distance, etc.) of points in a metric space. This is one of the fundamental problems that has a wide range of applications in the data mining area, since most of the data can be represented in a vector form residing in a high dimensional space, and we would like to identify the relationship among those data points. Typical applications include but not limited to, social data analysis, user pattern identification, motif mining in biological data, data clustering, etc. This is a very classical problem and has been studied very well in the past decades.
In this thesis, we study the Closest Pair problem and its variants, and also bring the machine learning perspective to solve some closely related problems. In particular, we have proposed two approximate algorithms to efficiently address the Closest Pair of Points (CPP) problem, and one deterministic approach to solve the Closest Pair of Subsequences (CPS) problem, using Euclidean distance measure. In addition, to identify the closest subsequences in the time series data, we have proposed a learnable feature extractor embedded in an artificial neural network, to learn patterns in the scope of the Dynamic Time Warping metric. In the end, to speed up the inference speed of the proposed algorithm, we have also proposed a neural network pruning technique to obtain a smaller network with similar capacity.
All the proposed methods are shown to have achieved the state-of-the-art performance in various standard benchmark datasets.
Recommended Citation
Cai, Xingyu, "Effective Algorithms for the Closest Pair and Related Problems" (2020). Doctoral Dissertations. 2423.
https://digitalcommons.lib.uconn.edu/dissertations/2423