RISS 검색 - 학위논문 상세보기

다국어 초록 (Multilingual Abstract)

Spectral clustering is one of the representative graph-based clustering methods. This method is widely used in various fields because it can cluster datasets with any data type and find non-convex-shaped clusters. However, spectral clustering has a chronic problem of being sensitive to noise. This noise problem makes spectral clustering impractical for real-world applications with highly-noisy data. To address this noise problem, many researchers have proposed robust spectral clustering methods. However, these methods have limitations in improving robustness against noise because they do not provide solutions suitable for the challenge of each noise type.
In this dissertation, we divide noises into two types, internal and external noises, and define the challenge of spectral clustering for each noise type. To deal with the different problems caused by both types of noise, we propose a novel robust spectral clustering method named KNN-RSC. The proposed KNN-RSC filters out potential external noises that are relatively sparse data using k-nearest neighbor based density estimation. Then, KNN-RSC constructs the filtered density-based affinity graph using a nearest-neighbor graph. By adaptively scaling each connected component of the nearest-neighbor graph based on local densities of vertices, the filtered density-based affinity graph capture the cluster structure, which is complicated by internal noises. In addition, KNN-RSC finds clusters with varying sizes, shapes, and densities by solving the graph-cut problem for the filtered density-based affinity graph. In experiments for real-world datasets, KNN-RSC achieves a clustering accuracy of at least 1.2 times and a maximum of 2.1 times better than existing robust spectral clustering methods.
However, KNN-RSC often provides impractical clustering results for high-dimensional data due to the “curse of dimensionality” problem. To alleviate the problem, in this dissertation, we propose KNN-SSC, incorporating KNN-RSC and subspace learning to improve the clustering accuracy for high-dimensional data. KNN-SSC effectively alleviates the “curse of dimensionality” problem by learning low-dimensional subspaces for each cluster. In particular, KNN-SSC learns the subspaces that inflect density-based similarity relationships by reducing the influence of internal and external noises using the filtered density-based affinity graph of KNN-RSC. By integrating the advantages of KNN-RSC and subspace learning, KNN-SSC achieves clustering accuracy up to 1.2 times better than the existing state-of-the-art subspace clustering method. In particular, for high-dimensional datasets, KNN-SSC achieves better clustering accuracy from a minimum of 1.2 times to a maximum of 2.0 times than that of KNN-RSC. To utilize the proposed methods widely in various fields, we apply the proposed KNN-RSC and KNN-SSC to deep learning applications, action recognition and image classification.

국문 초록 (Abstract)

데이터 군집화는 데이터집합으로부터 유사한 데이터들로 구성된 하위집합들을 찾는 대표적인 비지도 학습 방법이다. 데이터 군집화로부터 식별된 군집은 다른 군집과 차별화될 수 있는 특...

데이터 군집화는 데이터집합으로부터 유사한 데이터들로 구성된 하위집합들을 찾는 대표적인 비지도 학습 방법이다. 데이터 군집화로부터 식별된 군집은 다른 군집과 차별화될 수 있는 특징을 별도의 데이터 학습 과정과 학습을 위한 데이터 레이블 없이 식별할 수 있어 비지도 학습이 요구되는 다양한 응용분야에서 오랜 기간 동안 연구되어 왔다. 데이터 군집화는 군집을 정의하는 방법에 따라 중심 값, 계층, 밀도, 그리고 그래프 기반 접근방식으로 구분되며, 이 중 그래프 기반 접근방식은 데이터의 종류에 관계없이 데이터 간의 유사성으로 관계를 정의한 유사도 그래프로 표현 가능한 모든 데이터에 적용 가능하다는 장점과 원형의 군집을 포함한 다양한 형태의 군집을 찾을 수 있다는 장점으로 인해 상당한 주목을 받고 있다. 하지만 대표적인 그래프 기반 군집화 방법인 스펙트럴 군집화 방법은 잡음 데이터에 의해 유사성 그래프가 쉽게 변형되어 잘못된 데이터 군집을 찾게 되는 심각한 문제가 있다. 이러한 문제를 해결하기 위해 기존의 연구들은 잡음 데이터에 의한 영향을 최소화하여 최적의 유사성 그래프를 구성하는 방법에 초점을 두었다. 하지만 잡음 데이터는 어떤 군집에도 포함되지 않는 외부 잡음과 군집 내에서 데이터의 분포를 복잡하게 만드는 내부 잡음으로 분류될 수 있으며, 잡음 데이터의 종류에 따라 유사도 그래프에 미치는 영향이 다르기 때문에 기존의 연구들은 두 가지 상이한 종류의 잡음 데이터를 효과적으로 처리하지 못해 군집화 성능을 개선하는데 한계를 보였다.
본 학위논문에서는 기존 연구들의 한계점을 극복하기 위해 밀도 기반 잡음 데이터 필터링 방법과 밀도 기반 유사도 그래프에 기초하여 외부 잡음으로 인한 유사도 그래프의 변형을 최소화하고 내부 잡음으로 인한 복잡한 데이터 분포를 효과적으로 나타낼 수 있는 밀도 기반 유사도 그래프를 구축함으로써 향상된 군집화 성능을 제공하는 잡음 데이터에 강인한 새로운 스펙트럴 군집화 방법 KNN-RSC를 제안한다. 이를 위해 먼저 외부 잡음 데이터를 식별하여 제거하기 위한 최인접 이웃 그래프와 최인접 이웃을 활용한 밀도 추정 방법을 제안한다. 최인접 이웃 그래프는 가장 인접한 이웃 노드와 간선으로 연결된 방향 그래프로 데이터 집합의 밀도 분포를 효과적으로 나타낸다. 외부 잡음 데이터는 군집에 포함된 데이터들과 달리 상대적으로 희소하기 때문에 최인접 이웃과 밀도 차이가 크고 희소한 데이터를 잠재적 외부 잡음 데이터로 식별할 수 있다. 이러한 잠재적 외부 데이터를 필터링한 후 최인접 이웃 그래프를 확장하여 내부 잡음 데이터에 의해 복잡해진 데이터 분포를 반영한 유사도 그래프를 구성한다. 내부 잡음 데이터로 인해 군집들의 밀도가 다양화되며, 이를 식별하기 위해 최인접 이웃을 기반으로 인접한 노드 간의 밀도차이를 계산하여 데이터 분포에 적응적인 데이터 유사성을 정의한다. 이러한 밀도 기반 데이터 유사성에 기초하여 유사도 그래프를 구축하고 스펙트럴 군집화를 수행한다. 추가적으로 고차원 데이터에서 차원의 저주로 인한 데이터 밀도 추정의 단점을 보완하기 위해 KNN-RSC에 부분공간 학습 방법을 적용하여 확장한 KNN-SSC를 추가적으로 제안한다.
또한 제안된 방법들의 이론적 배경을 서술하는데 그치지 않고, 기존에 널리 사용되는 베이스라인 군집화 방법들과 우수한 성능을 보유한 스펙트럴 군집화 방법들과의 광범위한 비교 실험을 통해 제안된 방법들의 성능적 우수성을 실증하고자 한다. 먼저 잡음 데이터에 대한 강인성을 평가하기 위해 잡음 데이터의 수가 점진적으로 증가하는 합성 데이터를 생성하여 실험을 수행하며, 실제 분석환경의 시나리오를 충분히 고려할 수 있도록, 실세계에서 얻어진 데이터집합들을 이용한 실험 또한 수행한다.
이러한 실험 결과를 통해, 본 학위 논문에서 제안된 KNN-RSC와 KNN-SSC는 잡음 데이터의 종류에 관계없이 강인한 군집화 성능을 제공하며, 베이스라인 군집화 방법과 우수한 성능을 보유한 기존의 스펙트럴 군집화 방법들보다 우수한 군집화 성능을 제공함으로써 기존 군집화 연구들이 가진 한계성을 해결할 수 있음을 보인다. 또한, 제안된 군집화 방법들을 대표적인 심층학습 기반 컴퓨터 비전 작업인 행동 인식에 적용하여 제안된 군집화 방법들이 다양한 분야에서 유용하게 활용될 수 있음을 실험적으로 실증한다.

목차 (Table of Contents)

Chapter 1. Introduction 1
1.1 Background 1
1.2 Challenges of data clustering 3
1.3 Motivation and contribution 9
1.4 Organization of dissertation 13

Chapter 1. Introduction 1
1.1 Background 1
1.2 Challenges of data clustering 3
1.3 Motivation and contribution 9
1.4 Organization of dissertation 13
Chapter 2. Related work 14
2.1 Preliminaries 14
2.1.1 Spectral clustering 15
2.1.2 Self-representation for subspace learning 18
2.2 Existing clustering methods 20
2.2.1 Non-graph clustering methods 20
2.2.2 Graph clustering methods 23
2.2.3 Discussion 30
2.3 Applications of data clustering 31
Chapter 3. Robust spectral clustering 34
3.1 KNN-RSC: k-nearest neighbor based robust spectral clustering 34
3.1.1 Filtered density-based affinity graph construction 36
3.1.2 Averagely dense graph-cut solution 48
3.1.3 Complexity analysis of KNN-RSC 51
3.2 KNN-SSC: extension of KNN-RSC for high-dimensional data 52
3.2.1 KNN-RSC with high-dimensional data 52
3.2.2 Framework of KNN-SSC 56
3.2.3 Intrinsic affinity matrix learning 60
3.2.4 Convergence analysis of KNN-SSC 70
3.2.5 Complexity analysis of KNN-SSC 71
3.3 Application for action recognition 72
3.3.1 Deep learning-based action recognition 73
3.3.2 Cluster-guided temporal compression 76
Chapter 4. Performance evaluation 83
4.1 Experimental setup 84
4.1.1 Datasets 84
4.1.2 Competing methods 88
4.1.3 Implementations 89
4.1.4 Evaluation metrics 91
4.1.5 Parameter selection 93
4.2 Experimental results 95
4.2.1 Noise robustness 95
4.2.2 Clustering accuracy 101
4.2.3 Parameter sensitivity 107
4.2.4 Effectiveness to deep learning applications 111
4.3 Discussion 116
Chapter 5. Conclusion 119
References 121
Abstract (Korean) 142
Acknowledgement 145

상세검색

RISS 보유자료

상세검색

해외전자자료

Robust Spectral Clustering on Highly Noisy Data

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료