RISS 검색 - 학위논문

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

1
Enhancing Mathematical Word Problem Solving with Numerical Masking Strategy in Pre-trained Encoder-Decoder Language Models

Nilesh Kumar Srivastava 과학기술연합대학원대학교 한국과학기술정보연구원(KISTI) 2024 국내석사

RANK : 233023
Enhancing Mathematical Word Problem Solving with Numerical Masking Strategy in Pre-trained Encoder-Decoder Language Models Addressing Mathematical Word Problems (MWPs) within the domain of Natural Language Processing (NLP) is a challenging task. While decoder-based Large Language Models (LLMs) have demonstrated potential in MWP solving, their substantial size can be a hindrance. MWPs demand a unique fusion of language understanding and generation, making encoder-decoder models an enticing solution. However, these models grapple with specific complexities, especially the intricate handling of numerical values and mathematical relationships entwined within the context. In response, this study introduces a novel masking approach explicitly tailored for MWP solving, departing from the limitations of random masking. This approach trains the encoder-decoder framework to better predict essential numerical components and mathematical relationships within MWPs. This research endeavors to comprehensively explore the efficacy of the proposed masking approach in comparison to the conventional random masking methodology. Additionally, the study delves into the nuanced effects of varying masking percentages and span lengths during continual training on the performance of MWPs. By conducting a series of extensive experiments, this investigation un- ravels valuable insights, shedding light on the optimal configurations that contribute to an enhanced mathematical reasoning capacity within the realm of MWPs. The outcomes of this research present a promising narrative, showcasing substantial en- hancements in performance across a diverse range of datasets, including GSM8K, SVAMP, and MultiArith. Particularly noteworthy is the remarkable surge in accu- racy observed in the SVAMP dataset, soaring from an initial 12.90% to an impres- sive 43%. Similarly, the MultiArith dataset demonstrates a significant performance boost, with accuracy ascending from 26.88% to an elevated 49.53%. These findings underscore the effectiveness of our innovative masking approach in elevating the mathematical proficiency of encoder-decoder models dedicated to MWP solving. This study not only establishes the superiority of the proposed masking ap- proach but also provides valuable insights into the intricate dynamics of continual training, the impact of loss functions, and their combined influence on the pro- ficiency of encoder-decoder models in MWP solving. The nuanced variations ob- served in the results of different loss function combinations add depth to the research, contributing to the broader comprehension of optimizing models for MWP-solving tasks. Keywords: Mathematical Word Problems (MWPs); Language Models (LM); Masking Numerical Tokens: Random Masking; Mathematical Reasoning; Encoder- Decoder Models; Continual Training. 수치마스킹전략을이용한사전학습인코더-디코더언어모델의 수학문제풀이성능개선 자연어처리(NLP)태스크에서수학문제풀이태스크를(MWP)를해결하는 것은 어려운 태스크이다. 디코더 기반 LLM(거대 언어 모델)은 MWP 해결의 잠 재력을 입증했지만 이에 필요한 모델의 크기가 상당한 수준이어야 한다는 것이 한계이다. MWP는언어이해와생성의고유한융합을요구하므로인코더-디코더 모델이 대안이 될 수 있다. 그러나 이러한 모델은 특정 복잡성, 특히 컨텍스트 내에 얽혀 있는 수치 값 및 수학적 관계를 학습하여 추론하는 능력을 필요로 한 다. 이를 효과적으로 해결하기 위해, 이 연구에서는 무작위 마스킹의 한계에서 벗어나 MWP 해결을 위한 명시적으로 맞춤화된 새로운 마스킹 접근방식을 소 개하고 적용한다. 이 접근방식은 인코더-디코더 프레임워크가 MWP 내의 필수 수치구성요소와수학적관계를더잘예측하도록설계되었다. 본 연구에서는 기존의 무작위 마스킹 방법과 비교하여 제안된 마스킹 접 근방식의효율성을포괄적으로탐색하고결과를제시한다.또한이연구에서는 MWP성능에대한지속적인학습중다양한마스킹비율과스팬길이의미묘한 효과를조사하였다.일련의광범위한실험을수행함으로써이연구는귀중한통 찰력을밝혀내고MWP영역내에서향상된수학적추론능력에기여하는최적의 구성을 제안한다. 이 연구는 GSM8K, SVAMP 및 MultiArith를 포함한 다양한 NWP 데이터셋을 사용한 실험에서 성능이 크게 향상되었음을 보인다. 특히 주 목할 만한 점은 SVAMP 데이터셋에서 관찰된 정확도의 놀라운 상승으로, 초기 12.90%에서 43%까지성능향상을보였다.마찬가지로 MultiArith데이터셋에서 정확도가 26.88%에서 49.53%로증가함을보였다.이러한결과는MWP해결전 용인코더-디코더모델의수학적추론성능을높이는데있어제안한마스킹접근 방식이효과적이라는것을증명하였다. 이 연구는 제안된 마스킹 접근 방식의 우수성을입증할 뿐만 아니라 지속 적인 훈련의 복잡한 역학, 손실 함수의 영향 및 MWP 해결에서 인코더-디코더 모델의숙련도에대한결합된영향에대한통찰력을제공하였다.다양한손실함 수조합의결과에서관찰된미묘한변화는연구에깊이를더해MWP해결을위한 모델최적화에기여하였다. Keywords:수학단어예측(NWP);언어모델(LM);숫자토큰마스킹:무작위 마스킹;수학적추론;인코더-디코더모델;지속적인훈련.

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

맨처음 페이지로 1 맨끝 페이지로

상세검색

RISS 보유자료

상세검색

해외전자자료

연관 검색어 추천