RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 음성지원유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • KCI등재

        범주형 전환문항을 포함하는 항목 무응답 대체 방법 비교

        이다희(Da Hee Lee),안형진(Hyonggin An) 한국자료분석학회 2023 Journal of the Korean Data Analysis Society Vol.25 No.1

        사회조사에서는 항목 무응답이 발생했을 때 무응답으로 인한 정보의 손실을 줄이는 하나의 기법으로 범주형 전환문항(unfolding bracket question)을 사용한다. 범주형 전환문항을 포함한 항목무응답은 구간중도절단된 생존자료와 구조가 유사한데, 범주형 전환문항을 포함한 항목 무응답을 처리하는 대체 방법과 관련한 연구는 제한적이므로 본 연구에서는 범주형 전환문항을 이용한 대체와 이용하지 않은 대체 성능을 살펴보았고 범주형 전환문항을 이용하는 세 가지 대체 결과를 비교하였다. 첫 번째는 선형회귀 기반의 최근접이웃 핫덱대체이고 두 번째는 개체별로 범주구간 내에서 동일한 확률로 대체값을 선택하는 균일분포를 이용한 대체이며 세 번째는 비모수적 최대가능도 추정법으로 추정한 생존함수를 이용하는 대체 방법이다. 범주형 전환문항을 이용하지 않은 대체는 핫덱대체를 고려하였다. 모의실험을 통해 제안된 세 대체 방법들의 성능을 평가하였고 고령화연구패널조사 3차조사 예제를 통해 실제 자료에 대한 적용방법을 설명하였다. 범주형 전환문항을 이용한 대체가 범주형 전환문항을 이용하지 않은 대체보다 성능이 좋았다. 범주형 전환문항을 이용한 대체의 경우 무응답률이 비교적 낮을 때 비모수적 최대가능도 추정법으로 추정한 생존함수를 이용하는 대체 방법의 성능이 좋았다. 무응답률이 클 때 평균 편향 측면에서는 선형회귀 기반의 최근접이웃 핫덱대체 방법이, 평균제곱근오차 측면에서는 비모수적 최대가능도 추정법으로 추정한 생존함수를 이용하는 대체 방법이 더 나은 결과를 보였다. In social surveys, unfolding bracket questions are used as a technique to reduce the loss of information due to nonresponse when item nonresponses occur. Item nonresponse, including unfolding bracket questions, has a similar structure to interval censored survival data. However, imputation methods to handle item nonresponses including unfolding bracket questions are limited. In this study, we examine performance of imputation with and without unfolding bracket questions. Also we compare three imputation methods to handle unfolding bracket questions. The First method is a linear regression-based nearest neighbor hotdeck imputation. Secondly, it is possible to consider a uniform distribution that selects a value with the same probability within the section for each individual. The third is an imputation method using the survival function estimated by the nonparametric maximum likelihood estimation method. Hotdeck imputation was considered as an imputation for not using unfolding bracket questions. The performance of these imputation methods was evaluated through simulation and the KLoSA study is used to provide examples of the application of these methods to real data. Imputation methods using unfolding bracket questions showed better performance than the method without unfolding bracket questions. The performance of the imputation method using the survival function estimated by the nonparametric maximum likelihood estimation method was better when the nonresponse rate was low and when the nonresponse rate is large in terms of root mean square error. When the non-response rate is large, the linear regression-based nearest neighbor hotdeck imputation method showed better results in terms of mean bias.

      • KCI등재후보

        A comparison of imputation methods using machine learning models

        Heajung Suh,Jongwoo Song The Korean Statistical Society 2023 Communications for statistical applications and me Vol.30 No.3

        Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

      • KCI등재

        수질 데이터에 대한 결측값 대치 기법 연구

        전진영,민연아 한국컴퓨터정보학회 2024 韓國컴퓨터情報學會論文誌 Vol.29 No.4

        많은 연구자들이 다양한 모델을 이용하여 물의 수질을 평가하기 위해 노력하고 있다. 평가 모델에는 결측값이 없는 데이터셋이 필요하지만, 관측 데이터셋에는 결측값이 다수 포함되는 것이현실이다. 단순히 결측값을 삭제하는 방법은 경우에 따라 기저 데이터의 분포를 왜곡시키고 모델의 예측성능에도 편의(bias)를 불러올 위험성이 있다. 본 연구에서는 수질 데이터의 결측값 처리에적합한 기법을 탐색하기 위해, 기존의 KNN과 MICE Imputation, 그리고 생성형 신경망 모델인Autoencoder와 Denoising Autoencoder를 기반으로 몇 가지 대치 기법을 실험하였다. 실험 결과, KNN과 MICE Imputation의 결과를 평균한 Combined Imputation이 실측치에 가장 가깝게 값을 추정하였으며, 이 기법을 적용하여 결측값을 처리한 관측 데이터셋을 support vector machine과ensemble 기반의 분류 모델로 평가한 결과, 결측값을 삭제했을 때에 비해 Accuracy, F1 score, ROC-AUC score, 그리고 MCC(Mathews Correlation Coefficient) 지표가 향상되었다. Many researchers make efforts to evaluate water quality using various models. Such models require a dataset without missing values, but in real world, most datasets include missing values for various reasons. Simple deletion of samples having missing value(s) could distort distribution of the underlying data and pose a significant risk of biasing the model’s inference when the missing mechanism is not MCAR. In this study, to explore the most appropriate technique for handing missing values in water quality data, several imputation techniques were experimented based on existing KNN and MICE imputation with/without the generative neural network model, Autoencoder(AE) and Denoising Autoencoder(DAE). The results shows that KNN and MICE combined imputation without generative networks provides the closest estimated values to the true values. When evaluating binary classification models based on support vector machine and ensemble algorithms after applying the combined imputation technique to the observed water quality dataset with missing values, it shows better performance in terms of Accuracy, F1 score, RoC-AuC score and MCC compared to those evaluated after deleting samples having missing values.

      • KCI등재

        RESEARCH : Open Access ; Comparison of three boosting methods in parent-offspring trios for genotype imputation using simulation study

        ( Abbas Mikhchi ),( Mahmood Honarvar ),( Nasser Emam Jomeh Kashan ),( Saeed Zerehdaran ),( Mehdi Aminafshar ) 한국동물자원과학회(구 한국축산학회) 2016 한국축산학회지 Vol.58 No.1

        Background: Genotype imputation is an important process of predicting unknown genotypes, which uses reference population with dense genotypes to predict missing genotypes for both human and animal genetic variations at a low cost. Machine learning methods specially boosting methods have been used in genetic studies to explore the underlying genetic profile of disease and build models capable of predicting missing values of a marker. Methods: In this study strategies and factors affecting the imputation accuracy of parent-offspring trios compared from lower-density SNP panels (5 K) to high density (10 K) SNP panel using three different Boosting methods namely TotalBoost (TB), LogitBoost (LB) and AdaBoost (AB). The methods employed using simulated data to impute the un-typed SNPs in parent-offspring trios. Four different datasets of G1 (100 trios with 5 k SNPs), G2 (100 trios with 10 k SNPs), G3 (500 trios with 5 k SNPs), and G4 (500 trio with 10 k SNPs) were simulated. In four datasets all parents were genotyped completely, and offspring genotyped with a lower density panel. Results: Comparison of the three methods for imputation showed that the LB outperformed AB and TB for imputation accuracy. The time of computation were different between methods. The AB was the fastest algorithm. The higher SNP densities resulted the increase of the accuracy of imputation. Larger trios (i.e. 500) was better for performance of LB and TB. Conclusions: The conclusion is that the three methods do well in terms of imputation accuracy also the dense chip is recommended for imputation of parent-offspring trios.

      • KCI등재

        다중응답 문항에서 발생하는 무응답에 대한 대체 방법 비교

        송주원 한국자료분석학회 2014 Journal of the Korean Data Analysis Society Vol.16 No.2

        무응답(nonresponse)을 포함한 설문 자료에 대한 대체 기법은 일반적으로 응답자가 한 개의 문항에 대해 한 개의 응답만 제공한다고 가정한다. 한편 선다형 문항에 대해 다중응답을 허락하는 경우 응답자는 한 개의 응답 대신 해당되는 항목들을 모두 선택할 수 있는데 이와 같은 다중응답 문항에서 발생하는 무응답에 대하여 대체를 실시하는 방법에 관한 연구는 제한적이다. 본 연구에서는 다중응답이 가능한 문항에서 발생한 무응답에 대하여 대체를 실시하는 세 가지 방법을 고려하였다. 첫 번째는 선다형 문항의 각 항목을 이항 변수로 변환한 후 각 이항 변수에 대해 독립적으로 대체를 실시하는 방법이고 두 번째는 가능한 응답 조합들을 고려하여 각 무응답을 적절한 응답 조합으로 대체하는 방법이며 세 번째는 공통 기증자 핫덱대체 방법이다. 모의실험을 통해 제안된 세 가지 방법들의 성능을 평가하였고 고령화연구패널 제 1차조사의 다중응답 문항에 대한 예제를 통해 실제 자료에 대한 적용 방법을 설명하였다. 제안된 세 가지 대체 방법 모두 비율을 적절히 추정하였고 응답 조합을 고려한 대체 방법이 조금 더 나은 결과를 보였다. Imputation of incomplete survey data due to nonresponse normally assumes that participants provide only one answer for a question. On the other hand, when it is possible to choose more than one answer for a multiple question, participants can choose any number of items. However, imputation methods to handle multiple response data are limited. In this study, we consider three multiple imputation methods to handle multiple response questions. First, it is possible to transform multiple responses of a multiple-choice question into binary variables and conduct imputation to each of these binary variables independently. Secondly, it is possible to consider all possible combinations of multiple responses and impute nonresponses as one of these combinations. The third method is the common-donor hotdeck imputation. Simulation is conducted to compare performance of these imputation methods and a multiple response question of KLoSA study is used to provide an example of the application of these methods to real data. All of the three proposed imputation methods correctly estimated the proportion and the imputation method that considers all possible combinations of items provides better results.

      • KCI등재

        산업·직업별 고용구조조사에서 임금자료의 무응답 대체

        오민홍,천영민 한국자료분석학회 2009 Journal of the Korean Data Analysis Society Vol.11 No.3

        It is often answered incorrectly when wages or incomes are asked in a survey. Although wages are frequently used to evaluate returns to schooling, certificates, job training, and etc, such information related with personal income is hardly collected. In order to overcome the problems caused by missing observation, this study has tried to impute wage information in OES. Using a wage equation by gender, we utilized conditional mean and regression imputation. The results show that regression imputation method is preferable in terms of smaller RMSE and retaining characteristics of the original data. Compared with the data-based regression imputation, the method of model-based regression imputation produces better outcomes with respect to efficiencies in imputing wage information. 고용관련 조사에서는 일반적으로 임금 또는 소득에 대한 자료를 응답자에게 요구하는데, 응답자들은 이 문항에 대해 기피하거나 부정확하게 응답하게 된다. 고용관련조사에서 소득 자료는 학력, 자격취득, 직업훈련이수, 이직 효과 등을 분석하기 위해 많이 사용되는데, 무응답으로 인해 분석과정에서 제외되는 경우가 많다. 본 연구에서는 산업 직업별 고용구조조사(OES) 조사의 임금 변수에 대한 무응답 대체를 실시하였다. 성별에 따라 임금을 종속변수로 하는 전혀 다른 회귀모형을 세웠고, 이 때 얻어진 회귀계수를 통해 대체를 실시하였다. 분석 결과, 모형기반 회귀대체가 평균 대체에 비해 평균제곱근(RMSE)이 작을 뿐만 아니라, 원자료가 갖는 특성을 대체적으로 유지하는 것으로 나타났다. 또한 자료기반 회귀대체와 비교하면, 모형기반 회귀대체의 성능은 거의 유사하지만, 모형을 세우고 대체를 적용하는 데 있어 더 효율적이다.

      • KCI등재

        객관적 귀속론과 영미법상 법적 인과관계

        김종구(Kim, Jong-Goo) 韓國刑事法學會 2009 형사법연구 Vol.21 No.4

        Some crimes require the prosecution to prove that the defendant caused a particular result. Under Anglo-American law, the necessary causation is of two types: 'factual causation' and 'legal causation'. The dominant test for cause-in-fact is the sine qua non, or but-for test. If the harm would not have occurred unless the defendant had engaged in the conduct, there is cause in fact. However, there is no equivalent dominant test of legal cause. Legal causation is a flexible analysis involving a variety of policy considerations and which ultimately asks whether as a matter of policy the defendant should be held responsible for a particular result. The structure of the theory of causation in Korean and German law is almost the same as in Anglo-American law. Under Korean and German law, the causation has two aspects: 'natural causation', and 'objective imputation'. The natural causation is a matter of fact and determined by the sine qua non test like in Anglo-American law. Objective imputation questions cannot be answered solely by physical sciences. Thus, they are not facts that can be uncovered by scientifically examining cause and effect in the real world. The existence of factual causation will not alone support a finding of criminal liability. In addition, there must be proof of legal causation or possibility of objective imputation. The test for legal causation and objective imputation is not a matter of fact but a matter of law that is to be determined after evaluating various policy considerations. The tests for legal causation or objective imputation varies according to the point of view. Thus, it is important to develop certain tests for legal causation or objective imputation that can be approved of generally. For that purpose, it is necessary for Korean scholars to employ scholarly results from both German law and Anglo- American law.

      • KCI등재

        국민건강조사에서 무응답 대체에 관한 연구

        도세록,엄정국,이관제 한국자료분석학회 2008 Journal of the Korean Data Analysis Society Vol.10 No.4

        Nonresponses are unavoidable in sample survey. In multipurpose survey, nonresponses of survey data prevent doing a integrated data analysis. The primary object of this study is to assess the effective ways to impute nonresponses in multipurpose sample survey. We consider the construction of segmentation classes by decision trees algorithm to choose auxiliary variables for multiple imputation. Because decision trees can combine many kinds of variables and classify them into homogeneous classes. In addition to imputation we propose the post weighting adjustment by Generalized regregssion to protect under estimation of variance. A simulation study is done based on the data of 2001 Korean National Health Survey, which consists of Family Interview Survey, Health Behavior survey, and Examination Survey. From the Family Interview Survey, we select 12 variables and input them into CHAID(chi-cquare automatic interaction detection). From CHAID, we obtain 10 nonresponse segmentation classes which describing nonresponse propensities for Health Behavior Survey and 15 segmentations for Examination Survey. We use the segmentation classes as the covariates of multiple imputation for the missing values of the two surveys. In order to evaluate the effects of imputation method, we conduct a simulation and compare the relative biases of the outcomes which use segmentation classes as auxiliarly variables with those of using real variables. We find that segmentation method reduce relative biases a little in both means and variances. Using segmentation classes as covariates in multiple imputation seems to appropriate and overcome the difficulties in building imputation models and choosing auxiliarly variables. In order to estimate unbiased population total, we use GREG(generalized regregssion estimator) calibration estimator, the effects of GREG calibration are stable increases on variance estimation in Examination Survey, but abrupt in Health Behavior Survey. 무응답 조정층을 만드는 방법은 여러 가지가 있다. 비모수적으로 층을 만드는 의사결정나무 방법에서 다지분리(multiway split)가 가능한 CHAID(chi-squared automatic interaction detection)에 의하여 분류된 분할층을 무응답 대체 보조변수로 만들어 다목적 표본조사인 "2001년도 국민건강영양조사" 자료를 바탕으로 효과적인 통계적 무응답 처리 방법을 제시하였다. CHAID 수행을 위하여 최대 나무깊이(tree depth) 수, 한 가지에서 분리 node 수, 카이제곱 통계량의 유의확률을 지정하여 나무구조에 의한 분할층을 만들었다. 분산의 과소추정문제를 해결하기 위하여 일반화회귀 추정량(GREG: generalized regression estimator)을 적용하였고, 일반화회귀 추정 효과를 파악하기 위한 모의실험을 통하여 일반화회귀 추정 후의 결과를 비교하였다.

      • KCI등재후보

        Application of discrete Weibull regression model with multiple imputation

        Yoo, Hanna The Korean Statistical Society 2019 Communications for statistical applications and me Vol.26 No.3

        In this article we extend the discrete Weibull regression model in the presence of missing data. Discrete Weibull regression models can be adapted to various type of dispersion data however, it is not widely used. Recently Yoo (Journal of the Korean Data and Information Science Society, 30, 11-22, 2019) adapted the discrete Weibull regression model using single imputation. We extend their studies by using multiple imputation also with several various settings and compare the results. The purpose of this study is to address the merit of using multiple imputation in the presence of missing data in discrete count data. We analyzed the seventh Korean National Health and Nutrition Examination Survey (KNHANES VII), from 2016 to assess the factors influencing the variable, 1 month hospital stay, and we compared the results using discrete Weibull regression model with those of Poisson, negative Binomial and zero-inflated Poisson regression models, which are widely used in count data analyses. The results showed that the discrete Weibull regression model using multiple imputation provided the best fit. We also performed simulation studies to show the accuracy of the discrete Weibull regression using multiple imputation given both under- and over-dispersed distribution, as well as varying missing rates and sample size. Sensitivity analysis showed the influence of mis-specification and the robustness of the discrete Weibull model. Using imputation with discrete Weibull regression to analyze discrete data will increase explanatory power and is widely applicable to various types of dispersion data with a unified model.

      • KCI우수등재

        Large tests of independence in incomplete two-way contingency tables using fractional imputation

        Shin Soo Kang,Michael D. Larsen 한국데이터정보과학회 2015 한국데이터정보과학회지 Vol.26 No.4

        Imputation procedures fill-in missing values, thereby enabling complete data analyses. Fully efficient fractional imputation (FEFI) and multiple imputation (MI) create multiple versions of the missing observations, thereby reflecting uncertainty about their true values. Methods have been described for hypothesis testing with multiple imputation. Fractional imputation assigns weights to the observed data to compensate for missing values. The focus of this article is the development of tests of independence using FEFI for partially classified two-way contingency tables. Wald and deviance tests of independence under FEFI are proposed. Simulations are used to compare type I error rates and Power. The partially observed marginal information is useful for estimating the joint distribution of cell probabilities, but it is not useful for testing association. FEFI compares favorably to other methods in simulations.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼