Digital Library[ Search Result ]
Denoising Method for Document Grounded Conversation Datasets via Back Translation Process
Damrin Kim, Boeun Kim, Youngjin Jang, Harksoo Kim
http://doi.org/10.5626/JOK.2024.51.1.34
Document Grounded Conversation is a conversation between two or more speakers based on a given document. Document-based dialogue systems are tasks that generate responses to the last utterance of dialogue, and various document-based dialogue datasets in English have been released and actively studied. Notably, There is no active research in Korean that has been conducted due to the absence of a document-based conversation dataset in Korean. While KoDoc2dial, which translates the English document-based conversation dataset Doc2dial into Korean, was recently released, it contains the noise generated during the translation process. The noise in the KoDoc2Dial should be reduced because noise-containing datasets can negatively affect training and system consistency aspects. In this paper, we propose a method for reducing the noise contained in the KoDoc2Dial through filtering using the reverse translation process. The results of the experiments showed that the method proposed in this paper had a performance improvement of about 3.6%p in SacreBLEU compared to before filtering.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr