Spectral clustering is one of the representative graph-based clustering methods. This method is widely used in various fields because it can cluster datasets with any data type and find non-convex-shaped clusters. However, spectral clustering has a ch...
Spectral clustering is one of the representative graph-based clustering methods. This method is widely used in various fields because it can cluster datasets with any data type and find non-convex-shaped clusters. However, spectral clustering has a chronic problem of being sensitive to noise. This noise problem makes spectral clustering impractical for real-world applications with highly-noisy data. To address this noise problem, many researchers have proposed robust spectral clustering methods. However, these methods have limitations in improving robustness against noise because they do not provide solutions suitable for the challenge of each noise type.
In this dissertation, we divide noises into two types, internal and external noises, and define the challenge of spectral clustering for each noise type. To deal with the different problems caused by both types of noise, we propose a novel robust spectral clustering method named KNN-RSC. The proposed KNN-RSC filters out potential external noises that are relatively sparse data using k-nearest neighbor based density estimation. Then, KNN-RSC constructs the filtered density-based affinity graph using a nearest-neighbor graph. By adaptively scaling each connected component of the nearest-neighbor graph based on local densities of vertices, the filtered density-based affinity graph capture the cluster structure, which is complicated by internal noises. In addition, KNN-RSC finds clusters with varying sizes, shapes, and densities by solving the graph-cut problem for the filtered density-based affinity graph. In experiments for real-world datasets, KNN-RSC achieves a clustering accuracy of at least 1.2 times and a maximum of 2.1 times better than existing robust spectral clustering methods.
However, KNN-RSC often provides impractical clustering results for high-dimensional data due to the “curse of dimensionality” problem. To alleviate the problem, in this dissertation, we propose KNN-SSC, incorporating KNN-RSC and subspace learning to improve the clustering accuracy for high-dimensional data. KNN-SSC effectively alleviates the “curse of dimensionality” problem by learning low-dimensional subspaces for each cluster. In particular, KNN-SSC learns the subspaces that inflect density-based similarity relationships by reducing the influence of internal and external noises using the filtered density-based affinity graph of KNN-RSC. By integrating the advantages of KNN-RSC and subspace learning, KNN-SSC achieves clustering accuracy up to 1.2 times better than the existing state-of-the-art subspace clustering method. In particular, for high-dimensional datasets, KNN-SSC achieves better clustering accuracy from a minimum of 1.2 times to a maximum of 2.0 times than that of KNN-RSC. To utilize the proposed methods widely in various fields, we apply the proposed KNN-RSC and KNN-SSC to deep learning applications, action recognition and image classification.