RISS 검색 - 학위논문 상세보기

다국어 초록 (Multilingual Abstract)

Although at least one hundred thousand genes had been predicted in human being, the human genome project has reported only twenty five to thirty thousand genes, so far. Many reports from that and other research suggest that in order to express the life phenomena in highly complicated lives, not only genes but also their relationships, so called genetic network, play essential roles. According to evolution of lives, genetic network has become differentiated to modularized form. The functional module of genetic network means a tightly related group of genes isolated in genetic network and may carry out a specified biological function.
The aim of this research is to develop the efficient algorithm to identify a functional module through cluster analysis of gene expression data. Many clustering algorithms have been applied to analyze gene expression data, such as hierarchical clustering (Eisen et al., 1998), K-mean clustering (Herwig et al., 1999), and self-organizing maps (SOM) (Tamayo et al., 1999). Usually most of them assort the genes according to their similarities or dissimilarities between their expression profiles, but they are not suitable for identifying functional modules because of no consideration of relationships between genes.
A clustering framework, which had been developed by our lab (Han, 2007; Kim, 2007), was modified and applied to gene expression profile of nervous differentiation with cDNA microarray experiments. It takes advantage of SVD (singular value decomposition) that detects biologically meaningful gene expression patterns and dominant eigengene of each cluster can represent coherent pattern clearly.
The advanced clustering processes (1) begin with modified K-means that constructs subsets of genes on dominant common expression pattern according to similarities between expression profiles of genes. (2) And then, these subsets of genes are iteratively concentrated, through refinement algorithm, to subsets including genes showing tightly coherent expression pattern. (3) Through the enrichment of clusters, missed genes were included in clusters through estimation of covariance to each cluster.
As the results, a number of 134 clusters were obtained from the expression data of 6,331 genes, and biological function of each cluster was predicted as a module with Gene Ontology (GO). Especially, genes related to neurogenesis were significantly gathered in cluster 1 and 2. The Cluster 1 showed increased expression pattern and included Notch1, Ncdn, Unc5b and so on. The other side, the Cluster 2 revealed down-regulated pattern and included, Nrp2, Hey1, Zfhx3 and so on.
The investigation of biological functions about each cluster indicated that this clustering framework is useful tool for analysis of gene expression data and identification of functional module of genes.

번역하기

국문 초록 (Abstract)

Human genome project가 완성 단계에 이르면서 대부분의 생명과학자들의 기대와는 달리 이만 오천에서 삼만 개의 유전자만을 확인할 수 있었다. 그러므로 고도로 진화된 생명체에서는 유전자뿐만 아니라 유전자 사이의 관계 즉, 유전자 네트워크가 생명현상을 발현함에 있어 핵심적 역할을 수행함을 암시한다. 유전자 네트워크는 진화의 과정에서 특정 생명현상을 수행하는 부분이 모듈화 되었으며, 유전자 모듈은 생명현상과 연관되는 유전자들의 기능적 단위체로 고려할 수 있다. 따라서, 유전자의 군집화는 유전자 모듈을 추적하여야 한다. 현재까지 개발된 군집화 알고리즘은 Hierarchical Clustering, KMeans Clustering, 및 SOM 등 여러 가지가 있지만, 유전자의 기능적 모듈화보다는 발 현패턴에 의존하여 유전자를 군집화한다. 본 연구에서는 기존의 K-means 알고리즘에, singular value decomposition (SVD)의 패턴추출기법을 적용하여, 유전자의 기능적인 패턴을 분석하고 군집화하는 방법을 개발 및 개량 구현하였다.
이 군집화 알고리즘을 요약하면 다음과 같다. 1) 먼저, SVD를 수행하여, 고유유전자 (Eigengenes)를 추출하고, 이 고유유전자를 기준으로 K-means를 수행한다. 이 과정 을 통하여 발현패턴을 기준으로 유사하게 발현하는 유전자들의 군집들을 빠른 수행시간 에 생성하게 한다. 2) 이전 과정에서 생성된 군집내의 유전자들 중에 군집과 연관성이 적은 유전자를 순차적으로 제거함으로써 군집의 집중도를 증가시킨다. 3) 마지막 enrichment 과정 으로, K-means 과정에서 놓친 유전자를 포함시키고 또한 복합적 과 정에 관여하는 유전자를 군집에 포함시키는 overlapping 군집화를 수행한다.
마우스의 배아줄기세포의 신경분화과정에 Day 1,2,3,6,12,15에 관찰한 유전자 발현 데이터에 본 연구에서 개발 및 개량된 군집화 프로그램을 적용하여 본 결과, 6,331개의 유전자를 134개의 군집으로 분리할 수 있었다. 각 군집의 생물학적 기능의 연관성을 Gene Ontology (GO)의 생물학적 기능과 생화학적 기능에 대해서 군집의 분포를 관찰한 결과 발현 패턴 유사성뿐만 아니라, 생물학적 및 생화학적 기능에 따라 군집화 됨을 알 수 있었다.
본 연구의 결과는 이러한 군집분석기법이 유전자 발현 데이터의 분석에서 동일한 생물 학적 기능의 발현과 관련된 유전자들을 분리 할 수 있을 것으로 추정되며, 생명체 시스 템의 기저를 형성하는 유전자 네트워크의 기본 모듈로써의 유전자 군집을 파악할 수 있 을 것이다.

번역하기

Human genome project가 완성 단계에 이르면서 대부분의 생명과학자들의 기대와는 달리 이만 오천에서 삼만 개의 유전자만을 확인할 수 있었다. 그러므로 고도로 진화된 생명체에서는 유전자뿐만...

목차 (Table of Contents)