RISS 검색 - 국내학술지논문 상세보기

국문 초록 (Abstract)

기계 학습 과정에서 수집된 많은 정보들 중에는 학습하고자 하는 개념과 관련이 없거나 중복된 정보를 가진 경우가 많다. 또한 자료 자체에 오류가 있기도 하다. 이와 같이 학습 모델 생성을...

기계 학습 과정에서 수집된 많은 정보들 중에는 학습하고자 하는 개념과 관련이 없거나 중복된 정보를 가진 경우가 많다. 또한 자료 자체에 오류가 있기도 하다. 이와 같이 학습 모델 생성을 위해 수집된 정보를 신뢰할 수 없다면, 학습 과정에서도 정확한 지식 습득이 어렵다. 그래서 기계 학습은 학습 과정에서 정확한 지식 습득을 위해 특징 선택 방법을 사용한다. 특징 선택은 학습할 클래스와 관련이 없거나 중복된 정보를 학습 모델 생성 이전에 제거함으로써 학습 알고리즘의 성능을 향상시킨다. 기존의 특징 선택 방법들은 적절한 특징을 선택하기 위하여 문서가 균등하게 분포되어 있다고 가정한다. 하지만, 실제로는 그렇지 않으며, 문서의 수 또는 문서의 길이가 모두 동일한 학습 예제를 준비하는 것도 매우 어렵다.
본 논문에서는 보다 효율적으로 특징을 선택하기 위해 클래스 별 단어의 불순도와 문서의 불균등 분포를 고려한 특징 선택 방법을 제안한다. 클래스를 대표할 수 있는 특징 후보들을 단어의 불순도 측정을 통해 얻고, 문서의 불균등 분포를 고려하여 특징을 선택한다. 실험을 통해 보다 좋은 성능을 보임을 입증한다.

다국어 초록 (Multilingual Abstract)

Sample training data for machine learning often contain irrelevant information or redundant concept. It is also the case that the original data may include noise. If the information collected for constructing learning model is not reliable, it is difficult to obtain accurate information. So the system attempts to find relations or regulations between features and categories in the learning phase. The feature selection is to remove irrelevant or redundant information before constructing learning model. for improving its performance. Existing feature selection methods assume that the distribution of documents is balanced in terms of the number of documents for each class and the length of each document. In practice, however, it is difficult not only to prepare a set of documents with almost equal length, but also to define a number of classes with fixed number of document elements.
In this paper, we propose a new feature selection method that considers the impurities among the words and unbalanced distribution of documents in categories. We could obtain feature candidates using the word impurity and eventually select the features through unbalanced distribution of documents. We demonstrate that our method performs better than other existing methods via some experiments.

목차 (Table of Contents)

요약
Abstract
1. 서론
2. 관련 연구
3. 기존 특징 선택 방법의 문제점

요약
Abstract
1. 서론
2. 관련 연구
3. 기존 특징 선택 방법의 문제점
4. 단어의 불순도 및 불균등 분포를 고려한 특징 선택 방법
5. 실험 및 성능 평가
6. 결론 및 향후 연구과제
참고문헌

참고문헌 (Reference)

1 "Reuters-21578 Text Categorization Test Collection Distribution 1.0 README file (v 1.3)" 2004

2 "Machine Learning" McGraw Hill 1996

3 "In 10th National Conference on Artificial Intelligence" MIT Press 1992 129-134,

4 "Feature selection for clustering - a filter solution" 115-122, 2002

5 "Data Mining" Morgan Kaufmann Publishers 2000

6 "Correlation-based Feature Selection for Machine Learning" 1999

7 "A Comparative Study on Feature Selection in Text Categorization" 412-420, 1997

1 "Reuters-21578 Text Categorization Test Collection Distribution 1.0 README file (v 1.3)" 2004

2 "Machine Learning" McGraw Hill 1996

3 "In 10th National Conference on Artificial Intelligence" MIT Press 1992 129-134,

4 "Feature selection for clustering - a filter solution" 115-122, 2002

5 "Data Mining" Morgan Kaufmann Publishers 2000

6 "Correlation-based Feature Selection for Machine Learning" 1999

7 "A Comparative Study on Feature Selection in Text Categorization" 412-420, 1997

연월일	이력구분	이력상세
2014-09-01	평가	학술지 통합(기타)
2013-04-26	학술지명변경	한글명 : 정보과학회논문지 : 소프트웨어 및 응용</br>외국어명 : Journal of KIISE : Software and Applications
2011-01-01	평가	등재학술지 유지(등재유지)
2009-01-01	평가	등재학술지 유지(등재유지)
2008-10-17	학술지명변경	한글명 : 정보과학회논문지 : 소프트웨어 및 응용</br>외국어명 : Journal of KISS : Software and Applications
2007-01-01	평가	등재학술지 유지(등재유지)
2005-01-01	평가	등재학술지 유지(등재유지)
2002-01-01	평가	등재학술지 선정(등재후보2차)

상세검색

RISS 보유자료

상세검색

해외전자자료

문서의 불균등 분포를 고려한 단어 불순도 기반 특징 선택 방법 = An Enhanced Feature Selection Method Based on the Impurity of Words Considering Unbalanced Distribution of Documents

부가정보

동일학술지(권/호) 다른 논문

분석정보

인용정보 인용지수 설명보기

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료