RISS 검색 - 학위논문 상세보기

국문 초록 (Abstract)

유용한 디지털 형태의 문서들이 증가하고 그것들을 체계화할 필요성이 생기면서 자동 분류는 정보 시스템과 데이터 마이닝 분야에서 중요한 위치를 가지게 되었다. 많은 기계 학습 알고리즘들은 문서 분류 기능을 적용하고 있다. 대부분의 전통적인 문서 분류 시스템들은 단순한 단어들의 묶음(벡터)에 기초하고 있다. 그러나 이런 방법은 고차원의 특징 공간을 가지며 단어들 사이와 문서들 사이의 관계성을 무시하게 되어 분류의 효율성과 정확성을 떨어뜨리게 된다.
본 논문에서는 통계와 시소러스를 사용하여 분류 성능을 향상시킨 개념 기반 분류 알고리즘을 사용하였다. 본 논문에서 문서 분류기로 사용된 알고리즘은 K-Nearest Neighbor (KNN)와 역전파 신경망 알고리즘(Back Propagatrion Neural Network, BPNN)이다. BPNN은 분류와 패턴 인식 분야에서 광범위하게 사용되어 왔다. 그러나 표준 BPNN은 일반적으로 느린 학습 속도와 쉽게 지역 최소값에 빠진다는 단점을 가지고 있다.
본 논문에서는 BPNN 알고리즘의 두 가지 유효한 정밀화 방법을 제안하고 개념기반 분류 시스템에 적용했다. 제안된 방법들은 지역최소값에 빠지는 것을 개선하면서 신경망의 학습속도를 빠르게 만들 수 있다. 실험을 위하여 reuter-21578과 20 news group 데이터셋을 사용하였다. 실험 결과로 측정된 정확률, 재현율, F-measure 값을 통하여 본 논문이 제안한 분류 알고리즘이 높은 성능을 가지게 되었음을 알 수 있을 것이다.

번역하기

유용한 디지털 형태의 문서들이 증가하고 그것들을 체계화할 필요성이 생기면서 자동 분류는 정보 시스템과 데이터 마이닝 분야에서 중요한 위치를 가지게 되었다. 많은 기계 학습 알고리...

다국어 초록 (Multilingual Abstract)

Due to the increased availability of documents in digital form and the ensuing need to organize them, automatic text categorization has gained a prominent status in the information systems and data mining field. Many machine learning algorithms have been applied to text categorization tasks. Traditional text categorization systems are mostly based on bag of words. But this method using high dimensional feature space and ignoring relationships between terms and documents is decreased categorization efficiency and accuracy. In this dissertation, we use concept based text categorization which is based on statistic method and thesaurus based method to improve the categorization performance. We also employ K Nearest Neighbor (KNN) and Back Propagation Neural Network (BPNN) as text classifier. KNN is a simple and famous approach for text categorization. BPNN has been widely used in classification and pattern recognition. However the standard BPNN has some generally acknowledged limitations such as slow training speed and easily trap into local minimum. This dissertation proposes two effective refinement strategies for BPNN and applies them to concept based text categorization systems. These methods can speed up neural network training as well as alleviate the problem of being trapped in a local minimum. We conduct the experiments on the standard reuter-21578 and 20 news group data sets. Experimental results show that our proposed methods are able to achieve high categorization effectiveness as measured by precision, recall and F-measure.

번역하기

목차 (Table of Contents)

1. Introduction 1
1.1 Background of Text Categorization 1
1.2 Neural Networks for Text categorization 3
1.3 Main Contributions of this dissertation 4
1.4 Outline of this dissertation 8

1. Introduction 1
1.1 Background of Text Categorization 1
1.2 Neural Networks for Text categorization 3
1.3 Main Contributions of this dissertation 4
1.4 Outline of this dissertation 8
2. Applications and General Approaches in Text Categorization 9
2.1 Applications of Text Categorization 9
2.1.1 Automatic indexing 9
2.1.2 Document organization 10
2.1.3 Document filtering 11
2.1.4 Word sense disambiguation 13
2.2 Text Categorization Approaches 14
2.2.1 Rocchio algorithm 15
2.2.2 Support vector machine algorithm 15
2.2.3 Decision tree classifiers 17
2.2.4 Instance-Based Classifiers 19
2.2.5 Inductive Rule learning 20
2.2.6 Expert Systems 21
3. Text Categorization Approaches used in this dissertation 23
3.1 K - Nearest Neighbor Algorithm 23
3.1.1 Basic K-Nearest Neighbor Algorithm 23
3.1.2 K-Nearest Neighbor Algorithm for Text Categorization 24
3.2 Artificial Neural Networks 26
3.3 Neural Network Topologies 27
3.3.1 Feed-forward networks 27
3.3.2 Recurrent networks 28
3.4 Learning in Artificial Neural Networks 29
3.4.1 Supervise learning 29
3.4.2 Unsupervised learning 30
3.5 Standard BPNN algorithm 30
3.6 The Refinement Strategies for BPNN 33
3.6.1 The main problems of the standard BPNN 33
3.6.2 Morbidity neuron rectified back-propagation neural network (MRBP network) 35
3.6.3 Learning phase evaluation back propagation neural network (LPEBP network) 38
4. Concept based methods for text categorization 42
4.1 Vector space model (VSM) 42
4.2 Singular value decomposition (SVD) 43
4.3 Automatically thesaurus construction (ACT) 44
4.4 Combination of Automatically thesaurus construction (ACT) with WordNet (WN) 45
4.4.1 WordNet 45
4.4.2 Combination and Term Expansion Method 48
5. System Overview 49
5.1 Data sets and document selection 52
5.1.1 Reuters data set 52
5.1.2 20 news grope data set 57
5.1.3 Document selected for experimentation 59
5.2 Text Pre-processing 60
5.2.1 Word Extraction 60
5.2.2 Stop words removal 61
5.2.3 Word stemming 62
5.2.4 Term weight 64
5.3 Feature Selection 65
5.3.1 Document Frequency method 66
5.3.2 statistic (CHI) 67
5.3.3 Information gain (IG) 68
5.3.4 Mutual information (MI) 69
5.3.5 Odds ratio 70
5.3.6 Feature selection method used in this dissertation 71
5.4 Final Feature Weighting 73
5.4.1 Singular value decomposition 73
5.4.2 Automatically constructed thesaurus and WordNet 73
6. Experiments 75
6.1 Evaluation Measures 75
6.2 The Selection of Number of K for KNN 76
6.3 Hidden Nodes Determination for BPNN 78
6.4 Error Reduction for BPNN 79
6.5 Experimental Results 81
6.5.1 Singular value decomposition 81
6.5.2 Automatically Constructed Thesaurus (ACT) 85
6.5.3 Combination of ACT + WN 87
6.6 Computational Time Analysis 89
7. Conclusions and Discussions 93
APPENDIX A 95
List of Stop Words 95
APPENDIX B 101
20 newsgroups data set 101
Reference 102

상세검색

RISS 보유자료

상세검색

해외전자자료

BPNN의 효율적인 개선방법 및 개념에 기초한 문서분류 시스템 응용 = Effective Refinement Strategies for BPNN and its application to concept based text categorization system

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료