Document Grounded Conversation is a conversation between two or more speakers based on a given document. Document-based dialogue systems are tasks that generate responses to the last utterance of dialogue, and various document-based dialogue datasets ...
Document Grounded Conversation is a conversation between two or more speakers based on a given document. Document-based dialogue systems are tasks that generate responses to the last utterance of dialogue, and various document-based dialogue datasets in English have been released and actively studied. Notably, There is no active research in Korean that has been conducted due to the absence of a document-based conversation dataset in Korean. While KoDoc2dial, which translates the English document-based conversation dataset Doc2dial into Korean, was recently released, it contains the noise generated during the translation process. The noise in the KoDoc2Dial should be reduced because noise-containing datasets can negatively affect training and system consistency aspects. In this paper, we propose a method for reducing the noise contained in the KoDoc2Dial through filtering using the reverse translation process. The results of the experiments showed that the method proposed in this paper had a performance improvement of about 3.6%p in SacreBLEU compared to before filtering.