Document Type

Article

Major

Statistics

Mentor

Prof. Haim Bar, Dept. of Statistics

Disciplines

Physical Sciences and Mathematics | Statistics and Probability

Abstract

Multidimensional scaling (MDS), in which high-dimensional data is projected to a lower dimensional map, is often followed by clustering in the reduced plot. To examine the effect of MDS on clustering, we simulate several data structures and apply clustering methods, including topological data analysis. We first perform clustering using the data in the original, high-dimensional space, then perform MDS to scale the data down to a lower dimension, cluster on this scaled data, and compare differences in the results. We found that MDS can often decrease clustering performance, and is unable to correctly represent data structures with unique shapes or noise. The shape and noise of the data also greatly affect clustering performance. With different data shapes, some clustering methods had a noticeably different performance. Topological data analysis in particular had greater success in clustering data with clear structure.

Share

COinS