Search : [ author: Damrin Kim ] (1)

Denoising Method for Document Grounded Conversation Datasets via Back Translation Process

Damrin Kim, Boeun Kim, Youngjin Jang, Harksoo Kim

http://doi.org/10.5626/JOK.2024.51.1.34

Document Grounded Conversation is a conversation between two or more speakers based on a given document. Document-based dialogue systems are tasks that generate responses to the last utterance of dialogue, and various document-based dialogue datasets in English have been released and actively studied. Notably, There is no active research in Korean that has been conducted due to the absence of a document-based conversation dataset in Korean. While KoDoc2dial, which translates the English document-based conversation dataset Doc2dial into Korean, was recently released, it contains the noise generated during the translation process. The noise in the KoDoc2Dial should be reduced because noise-containing datasets can negatively affect training and system consistency aspects. In this paper, we propose a method for reducing the noise contained in the KoDoc2Dial through filtering using the reverse translation process. The results of the experiments showed that the method proposed in this paper had a performance improvement of about 3.6%p in SacreBLEU compared to before filtering.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr