RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      The Use of Large Language Models to Predict Item Properties.

      한글로보기

      https://www.riss.kr/link?id=T17164708

      • 저자
      • 발행사항

        Ann Arbor : ProQuest Dissertations & Theses, 2024

      • 학위수여대학

        Michigan State University Measurement and Quantitative Methods - Doctor of Philosophy

      • 수여연도

        2024

      • 작성언어

        영어

      • 주제어
      • 발행국

        United States of America

      • 학위

        Ph.D.

      • 페이지수

        162 p.

      • 지도교수/심사위원

        Advisor: Kelly, Kimberly.

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      소속기관이 구독 중이 아닌 경우 오후 4시부터 익일 오전 9시까지 원문보기가 가능합니다.

      부가정보

      다국어 초록 (Multilingual Abstract) kakao i 다국어 번역

      Calibrating items is a crucial yet costly requirement for both new tests and existing ones as items become outdated due to changing relevance or overexposure. Traditionally, this calibration involves giving items to a large number of participants, a process that requires substantial time and resources. To reduce these costs, researchers have sought alternative calibration methods. Before the emergence of Large Language Models (LLMs), these methods mainly relied on expert opinions or computational analysis of item features. Yet, the accuracy of experts in predicting item performance has varied, and computational approaches often struggle to capture the intricate semantic details of test items.The emergence of LLMs might offer a new avenue of addressing the need for item calibration. These models, popularized by OpenAI (like the GPT series), have shown remarkable abilities in mimicking complex human thought processes, and performing advanced reasoning tasks. Their achievements in passing sophisticated exams and executing cross-language translations underline their potential. However, their capacity for predicting item properties in test calibration has not been thoroughly investigated. Traditional calibration relies heavily on direct human interaction, such as pretesting and expert assessment, or on statistical modeling of item features through resource intensive machine learning algorithms. This dissertation explores the potential of LLMs to predict item characteristics, tasks that have traditionally required human insight or complex statistical models. With the increasing accessibility of high-performance LLMs from organizations like OpenAI, Meta, and Google, and through open-source platforms such as HuggingFace.com, there is promising ground for investigation. This study examines whether LLMs could replace human efforts in item calibration tasks.To evaluate the effectiveness of LLMs in predicting item properties, this dissertation implements a training and testing framework, focusing on assessing both the relative and absolute difficulties of items. It undertakes three theoretical investigations: firstly, examining the ability of LLMs to predict the relative difficulty of items; secondly, assessing the feasibility of using multiple LLMs as substitutes for test-takers and attempts to use their responses predictors of item difficulty; and thirdly, applying a search algorithm, guided by LLM predictions of relative difficulty, to ascertain absolute difficulties.The findings indicate that the models have statistical significance in predicting relative item difficulty, limited by modest explanatory power - with adjusted R-squared values around 5-10%. However, the application of LLMs in predicting relative item difficulties through pairwise comparisons proves to be more promising, achieving a pairwise accuracy of about 62% and demonstrating predicted correlations with item difficulty ranging between 0.36 and 0.42.This suggests that whereas LLMs show potential in certain aspects of item calibration, their effectiveness varies depending on the specific task. This demonstrates a potential promising result that warrants further exploration into the capabilities of LLMs for item calibration, potentially leading to more efficient and cost-effective methods in the field of test development and maintenance.
      번역하기

      Calibrating items is a crucial yet costly requirement for both new tests and existing ones as items become outdated due to changing relevance or overexposure. Traditionally, this calibration involves giving items to a large number of participants, a ...

      Calibrating items is a crucial yet costly requirement for both new tests and existing ones as items become outdated due to changing relevance or overexposure. Traditionally, this calibration involves giving items to a large number of participants, a process that requires substantial time and resources. To reduce these costs, researchers have sought alternative calibration methods. Before the emergence of Large Language Models (LLMs), these methods mainly relied on expert opinions or computational analysis of item features. Yet, the accuracy of experts in predicting item performance has varied, and computational approaches often struggle to capture the intricate semantic details of test items.The emergence of LLMs might offer a new avenue of addressing the need for item calibration. These models, popularized by OpenAI (like the GPT series), have shown remarkable abilities in mimicking complex human thought processes, and performing advanced reasoning tasks. Their achievements in passing sophisticated exams and executing cross-language translations underline their potential. However, their capacity for predicting item properties in test calibration has not been thoroughly investigated. Traditional calibration relies heavily on direct human interaction, such as pretesting and expert assessment, or on statistical modeling of item features through resource intensive machine learning algorithms. This dissertation explores the potential of LLMs to predict item characteristics, tasks that have traditionally required human insight or complex statistical models. With the increasing accessibility of high-performance LLMs from organizations like OpenAI, Meta, and Google, and through open-source platforms such as HuggingFace.com, there is promising ground for investigation. This study examines whether LLMs could replace human efforts in item calibration tasks.To evaluate the effectiveness of LLMs in predicting item properties, this dissertation implements a training and testing framework, focusing on assessing both the relative and absolute difficulties of items. It undertakes three theoretical investigations: firstly, examining the ability of LLMs to predict the relative difficulty of items; secondly, assessing the feasibility of using multiple LLMs as substitutes for test-takers and attempts to use their responses predictors of item difficulty; and thirdly, applying a search algorithm, guided by LLM predictions of relative difficulty, to ascertain absolute difficulties.The findings indicate that the models have statistical significance in predicting relative item difficulty, limited by modest explanatory power - with adjusted R-squared values around 5-10%. However, the application of LLMs in predicting relative item difficulties through pairwise comparisons proves to be more promising, achieving a pairwise accuracy of about 62% and demonstrating predicted correlations with item difficulty ranging between 0.36 and 0.42.This suggests that whereas LLMs show potential in certain aspects of item calibration, their effectiveness varies depending on the specific task. This demonstrates a potential promising result that warrants further exploration into the capabilities of LLMs for item calibration, potentially leading to more efficient and cost-effective methods in the field of test development and maintenance.

      더보기

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼