TY - JOUR T1 - Document-level Machine Translation Data Augmentation Using a Cluster Algorithm and NSP AU - Kim, Dokyoung AU - Lee, Changki JO - Journal of KIISE, JOK PY - 2023 DA - 2023/1/14 DO - 10.5626/JOK.2023.50.5.401 KW - neural machine translation KW - document-level machine translation KW - data augmentation KW - G-Transformer KW - NSP(Next Sentence Prediction) AB - In recent years, research on document level machine translation has been actively conducted to understand the context of the entire document and perform natural translation. Similar to the sentence-level machine translation model, a large amount of training data is required for training of the document-level machine translation model, but there is great difficulty in building a large amount of document-level parallel corpus. Therefore, in this paper, we propose a data augmentation technique effective for document-level machine translation in order to improve the lack of parallel corpus per document. As a result of the experiment, by applying the data augmentation technique using the cluster algorithm and NSP to the sentence unit parallel corpus without context, the performance of the document-level machine translation is improved by S-BLEU 3.0 and D-BLEU 2.7 compared to that before application of the data augmentation technique.