RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 음성지원유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • Feature Selection

        Vipin Kumar,Sonajharia Minz 한국산학기술학회 2014 SmartCR Vol.4 No.3

        Relevant feature identification has become an essential task to apply data mining algorithms effectively in real-world scenarios. Therefore, many feature selection methods have been proposed to obtain the relevant feature or feature subsets in the literature to achieve their objectives of classification and clustering. This paper introduces the concepts of feature relevance, general procedures, evaluation criteria, and the characteristics of feature selection. A comprehensive overview, categorization, and comparison of existing feature selection methods are also done, and the guidelines are also provided for user to select a feature selection algorithm without knowing the information of each algorithm. We conclude this work with real world applications, challenges, and future research directions of feature selection.

      • Performance Evaluation of Feature Selection Methods on Large Dimensional Databases

        Y. Leela Sandhya Rani,V. Sucharita,Debnath Bhattacharyya,Hye-Jin Kim 보안공학연구지원센터 2016 International Journal of Database Theory and Appli Vol.9 No.9

        Data mining retrieves knowledge information from larger amounts of data. Clustering is an assemble of similar objects in to one class and dissimilar objects in to another class. When designing clustering ensemble on large dimensional data space, both time and space requirements for processing may be overinflated. This tends to impose feature selection methods to remove redundant features and handle the noise data. There are filter, wrapper and hybrid methods in feature selection. This paper shows a tour on types of feature selection techniques and numbers of experiments are conducted to compare feature selection techniques using different datasets with R tool, which gives better technique for clustering ensemble design.

      • Rough set model based feature selection for mixed-type data with feature space decomposition

        Kim, Kyung-Jun,Jun, Chi-Hyuck Elsevier 2018 expert systems with applications Vol.103 No.-

        <P><B>Abstract</B></P> <P>Feature selection plays an important role in the classification problems associated with expert and intelligent systems. The central idea behind feature selection is to identify important input features in order to reduce the dimensionality of the input space while maintaining or improving classification performance. Traditional feature selection approaches were designed to handle either categorical or numerical features, but not the mix of both that often arises in real datasets. In this paper, we propose a novel feature selection algorithm for classifying mixed-type data, based on a rough set model, called feature selection for mixed-type data with feature space decomposition (FSMSD). This can handle both categorical and numerical features by utilizing rough set theory with a heterogeneous Euclidean-overlap metric, and can be applied to mixed-type data. It also uses feature space decomposition to preserve the properties of multi-valued categorical features, thereby reducing information loss and preserving the features’ physical meaning. The proposed algorithm was compared with four benchmark methods using real mixed-type datasets and biomedical datasets, and its performance was promising, indicating that it will be helpful to users of expert and intelligent systems.</P> <P><B>Highlights</B></P> <P> <UL> <LI> Interpretability of the feature selection for mixed-type data is increased. </LI> <LI> Any transforming procedure is not needed on categorical and numerical features. </LI> <LI> FSMSD selects features that are not biased by any data-type. </LI> <LI> FSMSD and benchmark methods are compared with 15 mixed-type data. </LI> </UL> </P>

      • Improving feature selection performance using pairwise pre-evaluation

        Li, Songlu,Oh, Sejong BioMed Central 2016 BMC bioinformatics Vol.17 No.-

        <P><B>Background</B></P><P>Biological data such as microarrays contain a huge number of features. Thus, it is necessary to select a small number of novel features to characterize the entire dataset. All combinations of the features subset must be evaluated to produce an ideal feature subset, but this is impossible using currently available computing power. Feature selection or feature subset selection provides a sub-optimal solution within a reasonable amount of time.</P><P><B>Results</B></P><P>In this study, we propose an improved feature selection method that uses information based on all the pairwise evaluations for a given dataset. We modify the original feature selection algorithms to use pre-evaluation information. The pre-evaluation captures the quality and interactions between two features. The feature subset should be improved by using the top ranking pairs for two features in the selection process.</P><P><B>Conclusions</B></P><P>Experimental results demonstrated that the proposed method improved the quality of the feature subset produced by modified feature selection algorithms. The proposed method can be applied to microarray and other high-dimensional data.</P>

      • An Approach to Feature Selection for Continuous Features of Objects

        Wang Hong-Wei,Li Guo-He,Li Xue 보안공학연구지원센터 2016 International Journal of Multimedia and Ubiquitous Vol.11 No.4

        A novel approach to feature selection is proposed for data space defined over continuous features. This approach can obtain a subset of features, such that the subset features can discriminate class labels of objects and the discriminant ability is prior or equivalent to that of the original features, so to effectively improve the learning performance and intelligibility of the classification model. According to the spatial distribution of objects and their classification labels, a data space is partitioned into subspaces, each with a clear edge and a single classification label. Then these labelled subspaces are projected to each continuous feature. The measurement of each feature is estimated for a subspace against all other subspace-projected features by means of statistical significance. Through the construction of a matrix of the measurements of the subspaces by all features, the subspace-projected features are ranked in a descending order based on the discriminant ability of each feature in the matrix. After evaluating a gain function of the discriminant ability defined by the best-so-far feature subset, the resulting feature subset can be incrementally determined. Our comprehensive experiments on the UCI Repository data sets have demonstrated that the approach of the subspace-based feature ranking and feature selection has greatly improved the effectiveness and efficiency of classifications on continuous features.

      • SCOPUSKCI등재

        Biological Feature Selection and Disease Gene Identification using New Stepwise Random Forests

        Hwang, Wook-Yeon Korean Institute of Industrial Engineers 2017 Industrial Engineeering & Management Systems Vol.16 No.1

        Identifying disease genes from human genome is a critical task in biomedical research. Important biological features to distinguish the disease genes from the non-disease genes have been mainly selected based on traditional feature selection approaches. However, the traditional feature selection approaches unnecessarily consider many unimportant biological features. As a result, although some of the existing classification techniques have been applied to disease gene identification, the prediction performance was not satisfactory. A small set of the most important biological features can enhance the accuracy of disease gene identification, as well as provide potentially useful knowledge for biologists or clinicians, who can further investigate the selected biological features as well as the potential disease genes. In this paper, we propose a new stepwise random forests (SRF) approach for biological feature selection and disease gene identification. The SRF approach consists of two stages. In the first stage, only important biological features are iteratively selected in a forward selection manner based on one-dimensional random forest regression, where the updated residual vector is considered as the current response vector. We can then determine a small set of important biological features. In the second stage, random forests classification with regard to the selected biological features is applied to identify disease genes. Our extensive experiments show that the proposed SRF approach outperforms the existing feature selection and classification techniques in terms of biological feature selection and disease gene identification.

      • SCOPUSKCI등재

        A Feature Selection-based Ensemble Method for Arrhythmia Classification

        Erdenetuya Namsrai,Tsendsuren Munkhdalai,Meijing Li,Jung Hoon Shin,Oyun Erdene Namsrai,Keun Ho Ryu 한국정보처리학회 2013 Journal of information processing systems Vol.9 No.1

        In this paper a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method the feature selection rate depends on the extracting order of the feature subsets. In the experiment we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.

      • SCOPUSKCI등재

        Biological Feature Selection and Disease Gene Identification using New Stepwise Random Forests

        Wook-Yeon Hwang 대한산업공학회 2017 Industrial Engineeering & Management Systems Vol.16 No.1

        Identifying disease genes from human genome is a critical task in biomedical research. Important biological features to distinguish the disease genes from the non-disease genes have been mainly selected based on traditional feature selection approaches. However, the traditional feature selection approaches unnecessarily consider many unimportant biological features. As a result, although some of the existing classification techniques have been applied to disease gene identification, the prediction performance was not satisfactory. A small set of the most important biological features can enhance the accuracy of disease gene identification, as well as provide potentially useful knowledge for biologists or clinicians, who can further investigate the selected biological features as well as the potential disease genes. In this paper, we propose a new stepwise random forests (SRF) approach for biological feature selection and disease gene identification. The SRF approach consists of two stages. In the first stage, only important biological features are iteratively selected in a forward selection manner based on one-dimensional random forest regression, where the updated residual vector is considered as the current response vector. We can then determine a small set of important biological features. In the second stage, random forests classification with regard to the selected biological features is applied to identify disease genes. Our extensive experiments show that the proposed SRF approach outperforms the existing feature selection and classification techniques in terms of biological feature selection and disease gene identification.

      • KCI등재

        Biological Feature Selection and Disease Gene Identification using New Stepwise Random Forests

        황욱연 대한산업공학회 2017 Industrial Engineeering & Management Systems Vol.16 No.1

        Identifying disease genes from human genome is a critical task in biomedical research. Important biological features to distinguish the disease genes from the non-disease genes have been mainly selected based on traditional feature selection approaches. However, the traditional feature selection approaches unnecessarily consider many unimportant biological features. As a result, although some of the existing classification techniques have been applied to disease gene identification, the prediction performance was not satisfactory. A small set of the most important biological features can enhance the accuracy of disease gene identification, as well as provide potentially useful knowledge for biologists or clinicians, who can further investigate the selected biological features as well as the potential disease genes. In this paper, we propose a new stepwise random forests (SRF) approach for biological feature selection and disease gene identification. The SRF approach consists of two stages. In the first stage, only important biological features are iteratively selected in a forward selection manner based on one-dimensional random forest regression, where the updated residual vector is considered as the current response vector. We can then determine a small set of important biological features. In the second stage, random forests classification with regard to the selected biological features is applied to identify disease genes. Our extensive experiments show that the proposed SRF approach outperforms the existing feature selection and classification techniques in terms of biological feature selection and diseasegene identification.

      • KCI등재

        효율적인 문서 분류를 위한 혼합 특징 집합과 하이브리드 특징 선택 기법

        인주호 ( Joo-ho In ),김정호 ( Jung-ho Kim ),채수환 ( Soo-hoan Chae ) 한국인터넷정보학회 2013 인터넷정보학회논문지 Vol.14 No.5

        본 연구에서는 효율적인 온 라인 문서 자동 분류를 위해 매우 중요한 분류 작업의 전처리 단계인 특징선택을 위한 새로운 방법이 제안된다. 대부분의 기존 특징선택 방법 연구에서는 특징 집합의 모집단이 단일 모집단으로써 한 모집단이 가지는 정보만으로 분류에 적합한 특징들을 선택하여 특징 집합을 구성하였다. 본 연구에서는 단일 모집단에 한하여 수행되는 특징선택 뿐 만 아니라, 다중모집단을 가지는 혼합 특징 집합에 대해서 특징선택을 함으로써 다양한 정보를 바탕으로 한 특징 집합을 구성하였다. 혼합 특징 집합은 두 종류의 특징 집합으로 구성된다. 즉 각각 문서로부터 추출한 단어로 구성된 원본 특징 집합과 원본 특징 집합으로부터 LSA를 이용하여 새로 생성한 변형 특징 집합이다. 혼합 특징 집합으로부터 필터 방법과 래퍼 방법을 이용한 하이브리드 방식의 특징 선택을 통해 최적의 특징 집합을 찾고, 이를 이용하여 문서 분류 실험을 수행하였다. 다양한 모집단의 특징들의 정보를 모두 고려함으로써 보다 향상된 분류 성능을 보일 것이라고 기대하였고, 인터넷 뉴스 기사를 대상으로 분류 실험한 결과 90% 이상의 향상된 분류성능을 확인하였다. 특히, 재현율과 정밀도 모두 90%이상의 성능을 보였으며, 둘 사이의 편차가 낮은 것을 확인하였다. A novel approach for the feature selection is proposed, which is the important preprocessing task of on-line document classification. In previous researches, the features based on information from their single population for feature selection task have been selected. In this paper, a mixed feature set is constructed by selecting features from multi-population as well as single population based on various information. The mixed feature set consists of two feature sets: the original feature set that is made up of words on documents and the transformed feature set that is made up of features generated by LSA. The hybrid feature selection method using both filter and wrapper method is used to obtain optimal features set from the mixed feature set. We performed classification experiments using the obtained optimal feature sets. As a result of the experiments, our expectation that our approach makes better performance of classification is verified, which is over 90% accuracy. In particular, it is confirmed that our approach has over 90% recall and precision that have a low deviation between categories.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼