http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
BERT를 이용한 한국어 질의응답 데이터 셋에서의 기계 독해
정서형(Seohyeong Jeong),곽노준(Nojun Kwak) 대한전자공학회 2020 대한전자공학회 학술대회 Vol.2020 No.11
Machine reading comprehension (MRC) in multi-language datasets has not been explored as much as in English datasets. In this work, we demonstrate BERT-based question answering system on Korean question and answering dataset called KorQuAD 2.0. This paper explores KorQuAD 2.0 with BERT-multilingual model released by Google and Larva-base which is a pre-trained language model on Korean corpus released by Naver using an additional tokenizer. We also adopt negative sampling during training to balance the ratio of positive and negative data samples and a different size of window stride during inference to increase the inference speed. As a result, we achieve 58.21% of exact match (EM) score and 77.33% of F1 score which largely outperforms the previous baseline of 30.24% EM and 45.96% of F1 score.