Training Data Augmentation Technique for Machine Comprehension by Question-Answer Pairs Generation Models based on a Pretrained Encoder-Decoder Model

Hyeonho Shin; Sung-Pil Choi

Training Data Augmentation Technique for Machine Comprehension by Question-Answer Pairs Generation Models based on a Pretrained Encoder-Decoder Model

Hyeonho Shin

Sung-Pil Choi

Vol. 49, No. 2, pp. 166-175, Feb. 2022

10.5626/JOK.2022.49.2.166

data augmentation

Machine Reading Comprehension

Natural Language Processing

answer extraction

Question generation

PDF

Abstract

The goal of Machine Reading Comprehension (MRC) research is to find answers to questions in documents. MRC research requires large-scale, high-quality data. However, individual researchers or small research institutes have limitations in constructing them. To overcome the limitations, in this paper, we propose an MRC data augmentation technique using a pre-training language model. This MRC data augmentation technique consists of a Q&A pair generation model and a data validation model. The Q&A pair generation model consists of an answer extraction model and a question generation model. Both models are constructed by fine-tuning the BART model. The data validation model is added to increase the reliability of the augmented data. It is used to verify the generated augmented data. The validation model is used by fine-tuning the ELECTRA model as an MRC model. To see the performance improvement of the MRC model through the data augmentation technique, we applied the data augmentation technique to KorQuAD v1.0 data. As a result of the experiment, compared to the previous model, the Exact Match(EM) Score increased up to 7.2 and the F1 Score increased up to 5.7.

Statistics

Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.

Cite this article

[IEEE Style]

H. Shin and S. Choi, "Training Data Augmentation Technique for Machine Comprehension by Question-Answer Pairs Generation Models based on a Pretrained Encoder-Decoder Model," Journal of KIISE, JOK, vol. 49, no. 2, pp. 166-175, 2022. DOI: 10.5626/JOK.2022.49.2.166.

[ACM Style]

Hyeonho Shin and Sung-Pil Choi. 2022. Training Data Augmentation Technique for Machine Comprehension by Question-Answer Pairs Generation Models based on a Pretrained Encoder-Decoder Model. Journal of KIISE, JOK, 49, 2, (2022), 166-175. DOI: 10.5626/JOK.2022.49.2.166.

[KCI Style]

신현호, 최성필, "사전 학습된 Encoder-Decoder 모델 기반 질의응답 쌍 생성을 통한 기계 독해 학습 데이터 증강 기법," 한국정보과학회 논문지, 제49권, 제2호, 166~175쪽, 2022. DOI: 10.5626/JOK.2022.49.2.166.

[Endnote/Zotero/Mendeley (RIS)] Download

[BibTeX] Download

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr