혼합형 검사의 길이, 문항 구성 비율, 분할점수 위치가 문항반응이론을 적용한 분류정확도 및 분류일치도 추정에 미치는 영향|RISS 상세보기

다국어 초록 (Multilingual Abstract)

In recent years, ranging from classroom assessment to large-scale standardized test, mixed-format tests which are composed of multiple-choice items and free-response items have been frequently used in many criterion-referenced test. This type of test can utilized both the merits of guaranteed objectivity and efficiency of scoring via the multiple-choice items, in addition to measuring the subjects` more comprehensive understanding via the free-response items. Furthermore, they are being developed based on item response theory which is useful for solving many problems in the field of educational measurement and enables the test results to have more practical implications using the models. Likewise, it is increasing that the standard setting method applying the test theory is used to the process of setting the cut score in the criterion-referenced evaluation.
Given this situation, it is necessary to use the same test theory and apply to a classification indices estimation method that reflects psychometric problems of mixed-format tests to determine the classification accuracy and consistency, which is the validity and reliability of the criterion-referenced evaluation. There have been proposed several methods to determine the classification indices of mixed-format tests in a single test. However, few studies have been conducted to evaluate the performance of Rudner method and Guo method, which set the cut score on the ability scale of the item response theory.
Thus, in this study, a simulation study was conducted to examine whether the classification indices estimated by the Rudner method and Guo method differed according to the length, item composition rate and the cut score location of the mixed-format tests and to investigate the interaction between the three study conditions and then, to find out which of the two methods produces more accurate estimates.
For this purpose, the two-parameter logistic model for the multiple-choice items and the generalized partial credit model for the free-response items were used to generate the mixed-format test for each study conditions. The test was consisted of 20 items, 60 items, and 10%, 30% and 50% of the free-response items and the subjects` ability was extracted from the standard normal distribution. Next, the maximum likelihood estimation method was applied to estimate the subjects' ability parameters and then, the classification accuracy, classification consistency, and kappa coefficient were estimated by Ruder method and Guo method respectively when the cut score is -1.0, -0.5, 0, 0.5, or 1.0. In addition, the “true” classification indices, which is the criterion for evaluating the accuracy of the two methods, was calculated and compared with the classification indices estimates, the standard error of estimates, bias, and root mean square error were calculated for each method.
The results of this study are summarized as follows. First, the longer the length of mixed-format test, the greater the classification indices regardless of the method. Second, the classification indices tended to grow as the proportion of free-response items in the mixed-format tests increased. This aspect was more prominent when the test length was short. Third, as the cut score was closer to zero, the classification accuracy and consistency indices became smaller, while the kappa coefficient became larger.
The conclusions based on the results of the study are as follows. First, it is reasonable to use two methods in estimating the classification accuracy and consistency of the mixed-format tests, but in the case of the kappa coefficient, it is necessary to pay attention to the fact that the two methods can calculate the inaccurate value depending on the test length and cut score location. Second, for accurate and consistent evaluation of the achievement level of the subjects, it is necessary to construct a test with sufficient number of items in the mixed-format tests. Third, when conducting a mixed-format tests consisting of a small number of items, it should be used with care according to the composition ratio of the multiple-choice items and free-response items. Fourth, when the cut score is located at the low or high level in the distribution of the subjects' abilities, the performance of the Guo method is relatively lower than that of the Rudner method.

번역하기

국문 초록 (Abstract)

최근 들어 교실 평가부터 대규모 표준화 검사까지 준거참조평가를 시행하는 여러 검사에서는 채점의 객관성과 효율성을 보장하는 선택형 문항과 피험자의 고등정신능력을 보다 종합적으로 측정할 수 있는 서답형 문항으로 구성된 혼합형 검사를 많이 활용하고 있다. 이와 같은 검사는 교육 측정 분야의 많은 문제를 해결하는데 유용하고 모형을 사용하여 검사 결과가 보다 실질적인 함의를 가질 수 있도록 해주는 문항반응이론을 기반으로 개발되고 있는 추세이다. 마찬가지로 준거참조평가에서 분할점수를 설정하는 과정에 문항반응이론을 기반으로 하는 준거설정방법을 적용하는 사례도 점차 증가하고 있다.
교육 현장에서 나타나는 이러한 현상을 고려할 때, 준거참조평가에서 이루어진 피험자 성취 수준에 대한 분류 결정의 타당도, 신뢰도라 할 수 있는 분류정확도 및 분류일치도를 파악하기 위해서는 같은 검사 이론을 사용하고 혼합형 검사의 측정학적 문제를 고려하는 분류 지수 추정방법을 사용할 필요가 있다. 이에 따라 한 번의 검사 시행으로 혼합형 검사의 분류 지수를 파악할 수 있는 여러 방법들이 제안되었으나 문항반응이론의 능력 척도 상에 분할점수를 설정하고 분류 지수를 추정하는 Rudner방법과 Guo방법에 대한 수행 능력을 평가한 연구는 거의 진행되지 않았다.
따라서 본 연구에서는 모의실험 연구를 통해 Rudner방법, Guo방법으로 추정한 분류 지수가 혼합형 검사의 길이, 문항 구성 비율, 분할점수 위치에 따라 차이를 보이는지 그리고 세 연구 조건 간의 상호작용 효과가 있는지 살펴보고 두 방법 중 어떤 방법이 더 정확한 추정치를 산출하는지 알아보고자 하였다. 이를 위해 혼합형 검사를 구성하는 선택형 문항에는 2모수 로지스틱 모형, 서답형 문항에는 일반화부분점수모형을 사용하여 검사 길이가 20문항, 60문항 그리고 검사 길이별로 서답형 문항의 비율을 10%, 30%, 50%로 구성하여 연구 조건에 따라 모의 자료를 반복 생성하였고 피험자의 능력은 표준 정규 분포로부터 추출하였다. 다음으로 최대우도추정법을 적용하여 피험자 능력 모수를 추정한 뒤 두 방법으로 분할점수가 –1.0, -0.5, 0, 0.5, 1.0일 때의 분류정확도, 분류일치도, 카파계수 추정치를 각각 산출하였다. 또한 두 방법의 수행 능력과 정확성을 평가하기 위한 준거인 진 분류 지수를 연구 조건마다 계산하였고 이를 분류 지수 추정치와 비교하여 방법별로 추정의 표준오차, 편의, 평균 제곱근 오차를 계산하였다.
이를 바탕으로 도출한 본 연구의 결과는 다음과 같다. 첫째, 혼합형 검사의 길이가 길어지면 분류 지수는 높아졌다. 둘째, 혼합형 검사를 구성하는 선택형 문항과 서답형 문항 중 서답형 문항의 비율이 증가할수록 분류 지수는 높아졌으며, 이러한 양상은 검사 길이가 짧을 때 더욱 두드러졌다. 셋째, 분할점수가 0에 가깝게 위치할수록 분류정확도와 분류일치도 지수는 작아졌으며, 카파계수는 이와 반대의 결과 양상을 보였다.
연구 결과를 바탕으로 한 결론은 다음과 같다. 첫째, 혼합형 검사의 분류정확도와 분류일치도 지수 추정 시 두 방법을 활용하는 것은 합리적이나 카파계수의 경우 검사 길이, 분할점수 위치에 따라 두 방법이 정확하지 않은 값을 산출할 수 있으므로 유의하여야 한다. 둘째, 정확하고 일관된 피험자의 성취 수준 평가를 위해서는 혼합형 검사의 문항 수를 충분히 확보하여 검사를 구성할 필요가 있다. 셋째, 적은 수의 문항으로 구성되는 혼합형 검사를 시행할 때는 선택형 문항과 서답형 문항의 구성 비율에 대하여 유의할 필요가 있다. 넷째, 분할점수가 피험자 능력 분포에서 능력이 낮거나 높은 수준에 위치하면 Guo방법의 수행 능력은 Rudner방법에 비해 상대적으로 낮아진다.

번역하기

최근 들어 교실 평가부터 대규모 표준화 검사까지 준거참조평가를 시행하는 여러 검사에서는 채점의 객관성과 효율성을 보장하는 선택형 문항과 피험자의 고등정신능력을 보다 종합적으로...

상세검색

RISS 보유자료

상세검색

해외전자자료

혼합형 검사의 길이, 문항 구성 비율, 분할점수 위치가 문항반응이론을 적용한 분류정확도 및 분류일치도 추정에 미치는 영향

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료