Although at least one hundred thousand genes had been predicted in human being, the human genome project has reported only twenty five to thirty thousand genes, so far. Many reports from that and other research suggest that in order to express the lif...
Although at least one hundred thousand genes had been predicted in human being, the human genome project has reported only twenty five to thirty thousand genes, so far. Many reports from that and other research suggest that in order to express the life phenomena in highly complicated lives, not only genes but also their relationships, so called genetic network, play essential roles. According to evolution of lives, genetic network has become differentiated to modularized form. The functional module of genetic network means a tightly related group of genes isolated in genetic network and may carry out a specified biological function.
The aim of this research is to develop the efficient algorithm to identify a functional module through cluster analysis of gene expression data. Many clustering algorithms have been applied to analyze gene expression data, such as hierarchical clustering (Eisen et al., 1998), K-mean clustering (Herwig et al., 1999), and self-organizing maps (SOM) (Tamayo et al., 1999). Usually most of them assort the genes according to their similarities or dissimilarities between their expression profiles, but they are not suitable for identifying functional modules because of no consideration of relationships between genes.
A clustering framework, which had been developed by our lab (Han, 2007; Kim, 2007), was modified and applied to gene expression profile of nervous differentiation with cDNA microarray experiments. It takes advantage of SVD (singular value decomposition) that detects biologically meaningful gene expression patterns and dominant eigengene of each cluster can represent coherent pattern clearly.
The advanced clustering processes (1) begin with modified K-means that constructs subsets of genes on dominant common expression pattern according to similarities between expression profiles of genes. (2) And then, these subsets of genes are iteratively concentrated, through refinement algorithm, to subsets including genes showing tightly coherent expression pattern. (3) Through the enrichment of clusters, missed genes were included in clusters through estimation of covariance to each cluster.
As the results, a number of 134 clusters were obtained from the expression data of 6,331 genes, and biological function of each cluster was predicted as a module with Gene Ontology (GO). Especially, genes related to neurogenesis were significantly gathered in cluster 1 and 2. The Cluster 1 showed increased expression pattern and included Notch1, Ncdn, Unc5b and so on. The other side, the Cluster 2 revealed down-regulated pattern and included, Nrp2, Hey1, Zfhx3 and so on.
The investigation of biological functions about each cluster indicated that this clustering framework is useful tool for analysis of gene expression data and identification of functional module of genes.