RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      Studies of Rater and Item Effects in Rater Models.

      한글로보기

      https://www.riss.kr/link?id=T15822136

      • 저자
      • 발행사항

        Ann Arbor : ProQuest Dissertations & Theses, 2020

      • 학위수여대학

        Columbia University TC: Measurement and Evaluation

      • 수여연도

        2020

      • 작성언어

        영어

      • 주제어
      • 학위

        Ph.D.

      • 페이지수

        110 p.

      • 지도교수/심사위원

        Advisor: DeCarlo, Lawrence T.

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract) kakao i 다국어 번역

      The goal underlying educational testing is to measure psychological constructs in a particular domain and to produce valid inferences about examinees’ ability. To achieve this goal of getting a precise ability evaluation, test developers construct questions with different formats, such as multiple-choice (MC) items, and open-ended questions or constructed response (CR) test items, for example, essay items. In recent years, large-scale assessments have implemented CR items in addition to MC items as an essential component of the educational assessment landscape. However, utilizing CR items in testing involves two main challenges, including rater effects and rater correlations. One challenge is the error added by human raters’ subjective judgments, such as rater severity and rater central tendency. Rater severity effect refers to the effect that raters may tend to give consistently low or high ratings that cause biased ability evaluation (Leckie & Baird, 2011). Central tendency describes when raters tend to use middle categories in the scoring rubric and avoid using extreme criteria (Saal et al., 1980). The second challenge is that multiple raters usually grade an examinee’s essay for quality control purposes; however, ratings based on the same item are correlated and need to be handled carefully by appropriate statistical procedures (Eckes, 2011; Kim, 2009). To solve these problems, DeCarlo (2010) proposed an HRM-SDT model that extended the traditional signal detection theory (SDT) model used in the first level of HRM. The HRM-SDT model not only considers the hierarchical structure of rating data but also deals with various rater effects beyond rater severity. This research examined to what extent the HRM-SDT separates rater effects (i.e., rater severity and rater central tendency) from item effects (i.e., item difficulty). Accordingly, one goal of this study was to simulate various rater effects and item effects to investigate the performance of the HRM-SDT model with respect to separating these effects. The other goal was to compare the fit of the HRM-SDT model with one commonly used model in language assessments, the Rasch model, in different simulation conditions and to examine the difference between these two models in terms of segregating rater and item effects.To answer these questions, Simulation A and Simulation B were conducted. In Simulation A, seven sets of parameters were varied in the first set of simulations. Simulation B addressed some questions of particular interest using another four sets of parameters, where both the rater and item parameters were simultaneously varied. This study found the HRM-SDT accurately recovered parameters, and clearly detected and separated changes in rater severity, rater central tendency, and item difficulty in most conditions.
      번역하기

      The goal underlying educational testing is to measure psychological constructs in a particular domain and to produce valid inferences about examinees’ ability. To achieve this goal of getting a precise ability evaluation, test developers construct ...

      The goal underlying educational testing is to measure psychological constructs in a particular domain and to produce valid inferences about examinees’ ability. To achieve this goal of getting a precise ability evaluation, test developers construct questions with different formats, such as multiple-choice (MC) items, and open-ended questions or constructed response (CR) test items, for example, essay items. In recent years, large-scale assessments have implemented CR items in addition to MC items as an essential component of the educational assessment landscape. However, utilizing CR items in testing involves two main challenges, including rater effects and rater correlations. One challenge is the error added by human raters’ subjective judgments, such as rater severity and rater central tendency. Rater severity effect refers to the effect that raters may tend to give consistently low or high ratings that cause biased ability evaluation (Leckie & Baird, 2011). Central tendency describes when raters tend to use middle categories in the scoring rubric and avoid using extreme criteria (Saal et al., 1980). The second challenge is that multiple raters usually grade an examinee’s essay for quality control purposes; however, ratings based on the same item are correlated and need to be handled carefully by appropriate statistical procedures (Eckes, 2011; Kim, 2009). To solve these problems, DeCarlo (2010) proposed an HRM-SDT model that extended the traditional signal detection theory (SDT) model used in the first level of HRM. The HRM-SDT model not only considers the hierarchical structure of rating data but also deals with various rater effects beyond rater severity. This research examined to what extent the HRM-SDT separates rater effects (i.e., rater severity and rater central tendency) from item effects (i.e., item difficulty). Accordingly, one goal of this study was to simulate various rater effects and item effects to investigate the performance of the HRM-SDT model with respect to separating these effects. The other goal was to compare the fit of the HRM-SDT model with one commonly used model in language assessments, the Rasch model, in different simulation conditions and to examine the difference between these two models in terms of segregating rater and item effects.To answer these questions, Simulation A and Simulation B were conducted. In Simulation A, seven sets of parameters were varied in the first set of simulations. Simulation B addressed some questions of particular interest using another four sets of parameters, where both the rater and item parameters were simultaneously varied. This study found the HRM-SDT accurately recovered parameters, and clearly detected and separated changes in rater severity, rater central tendency, and item difficulty in most conditions.

      더보기

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼