TY  - JOUR
T1  - Training Data Augmentation Technique for Machine Comprehension by Question-Answer Pairs Generation Models based on a Pretrained Encoder-Decoder Model
AU  - Shin, Hyeonho 
AU  - Choi, Sung-Pil 
JO  - Journal of KIISE, JOK
PY  - 2022
DA  - 2022/1/14
DO  - 10.5626/JOK.2022.49.2.166
KW  - data augmentation
KW  - machine reading comprehension
KW  - natural language processing
KW  - answer extraction
KW  - question generation
AB  - The goal of Machine Reading Comprehension (MRC) research is to find answers to questions in documents. MRC research requires large-scale, high-quality data. However, individual researchers or small research institutes have limitations in constructing them. To overcome the limitations, in this paper, we propose an MRC data augmentation technique using a pre-training language model. This MRC data augmentation technique consists of a Q&amp;A pair generation model and a data validation model. The Q&amp;A pair generation model consists of an answer extraction model and a question generation model. Both models are constructed by fine-tuning the BART model. The data validation model is added to increase the reliability of the augmented data. It is used to verify the generated augmented data. The validation model is used by fine-tuning the ELECTRA model as an MRC model. To see the performance improvement of the MRC model through the data augmentation technique, we applied the data augmentation technique to KorQuAD v1.0 data. As a result of the experiment, compared to the previous model, the Exact Match(EM) Score increased up to 7.2 and the F1 Score increased up to 5.7.