TY - JOUR T1 - Training Data Augmentation Technique for Machine Comprehension by Question-Answer Pairs Generation Models based on a Pretrained Encoder-Decoder Model AU - Shin, Hyeonho AU - Choi, Sung-Pil JO - Journal of KIISE, JOK PY - 2022 DA - 2022/1/14 DO - 10.5626/JOK.2022.49.2.166 KW - data augmentation KW - machine reading comprehension KW - natural language processing KW - answer extraction KW - question generation AB - The goal of Machine Reading Comprehension (MRC) research is to find answers to questions in documents. MRC research requires large-scale, high-quality data. However, individual researchers or small research institutes have limitations in constructing them. To overcome the limitations, in this paper, we propose an MRC data augmentation technique using a pre-training language model. This MRC data augmentation technique consists of a Q&A pair generation model and a data validation model. The Q&A pair generation model consists of an answer extraction model and a question generation model. Both models are constructed by fine-tuning the BART model. The data validation model is added to increase the reliability of the augmented data. It is used to verify the generated augmented data. The validation model is used by fine-tuning the ELECTRA model as an MRC model. To see the performance improvement of the MRC model through the data augmentation technique, we applied the data augmentation technique to KorQuAD v1.0 data. As a result of the experiment, compared to the previous model, the Exact Match(EM) Score increased up to 7.2 and the F1 Score increased up to 5.7.