RISS 검색 - 학위논문 상세보기

다국어 초록 (Multilingual Abstract)

Cancer is a class of complex genetic diseases characterized by out-of-control cell growth. Cancer classification has been a crucial topic of research in cancer treatment. For the last decade, mRNA expression profiling with microarray has been widely used to classify the different types of human cancers. However, microarray data poses a severe challenge for computational techniques. We need dimension reduction techniques that identify a small set of genes to achieve better learning performance. From the perspective of machine learning, the selection of genes can be considered to be a feature selection problem that aims to find a small subset of features that has the most discriminative information for the target.
In this thesis, we proposed an Ensemble Correlation-Based Gene Selection (ECBGS) algorithm based on symmetrical uncertainty (SU) and Support Vector Machine (SVM). In our method, symmetrical uncertainty was used to analyze the relevance of the genes, the different starting points of the relevant subset were used to generate the gene subsets, and the SVM was used as an evaluation criterion of the wrapper.
During the experiments, we used six freely accessible benchmark datasets from the Internet to meet our objective, which was to evaluate and investigate the performance of our method using the classifiers trained from both 10-cross validation and different sizes of dataset. The results show that the classification model with our proposed gene selection algorithm has higher prediction accuracy and that our method can still achieve high accuracy when the number of training instances is small. Compared with other methods published in the literature, our method yields good results.
ECBGS can potentially be used in miRNA expression profiling for cancer classification. Moreover, we believe that our mechanism is also applicable to other feature selection problems and can be expanded to other classifications of disease states.

국문 초록 (Abstract)

암 분류는 암 치료에 있어서 가장 기본적이고 중요한 연구 분야이다. 최근 10년간, 마이크로어레이를 이용한 mRNA 발현 프로파일이 다양한 종류의 암을 분류하는데 사용되었다. 마이크로어레...

암 분류는 암 치료에 있어서 가장 기본적이고 중요한 연구 분야이다. 최근 10년간, 마이크로어레이를 이용한 mRNA 발현 프로파일이 다양한 종류의 암을 분류하는데 사용되었다. 마이크로어레이 데이터는 보통 수천 개 또는 수만 개 이상의 유전자로 구성되지만 대상 암과 관련된 유전자는 많지 않다. 따라서 정보량이 많은(informative) 유전자를 선택하는 것은 마이크로어레이 데이터 분석을 위한 필수적인 단계이다. 기계학습 관점에서, 정보량이 많은 유전자를 선택하는 것은 분류에서 관련이 있는 속성(feature)들의 부분집합을 선택하는 속성선택(feature selection)의 문제로 간주할 수 있다.
본 논문에서는, symmetrical uncertainty와 지지도 벡터 기계를 이용한 상관성 기반 유전자 선택 알고리즘을 제안하였다. 제안한 알고리즘에서는 symmetrical uncertainty가 유전자의 관련성(relevance)을 측정하는데 사용되었고, 관련 집합의 서로 다른 유전자를 시작점으로 하여 유전자 부분 집합을 생성하고, 지지도 벡터 기계를 부분 집합 평가 알고리즘으로 사용하였다.
제안한 알고리즘의 효율성을 검증하기 위하여 본 논문에서는 6개의 공개된 유전자 발현 데이터를 사용하였고, 10-fold 교차 검증법과 다양한 크기의 데이터 집합을 사용하였다. 그 결과, 제안한 유전자 선택 기법을 통해 구축된 분류 모델은 다른 알고리즘에 비해 높은 정확도를 보였고, 또한 데이터의 크기가 작을 경우에도 높은 성능을 보였다.
제안하는 ECBGS는 miRNA 발현 데이터를 이용한 암 분류에도 사용 될 수 있다. 뿐만 아니라, 제안하는 알고리즘은 다른 종류의 속성 선택 문제에도 적용을 할 수 있고, 여러 가지 질병 상태를 분류하는 문제를 해결하는 방법으로도 확장할 수 있다.

목차 (Table of Contents)

Chapter 1 Introduction 1
Chapter 2 Related Work 4
2.1 Feature Selection 4
2.2 Gene Selection 5

Chapter 1 Introduction 1
Chapter 2 Related Work 4
2.1 Feature Selection 4
2.2 Gene Selection 5
Chapter 3 Methodology 7
3.1 Fast Correlation-Based Filter 8
3.2 Ensemble Correlation-Based Gene Selection 9
3.3 Support Vector Machine 14
Chapter 4 Experiments and Results 16
4.1 Dataset Description 16
4.2 Parameter Settings for SVM 18
4.3 Performance Evaluation 19
4.4 Discussion 32
Chapter 5 Conclusion 36
References 38
Abstract(Korean) 43

상세검색

RISS 보유자료

상세검색

해외전자자료

ECBGS = 암 분류를 위한 앙상블 상관성 기반의 유전자 선택

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료