Journal of KIISE

Search : [ keyword: 정보추출 ] (3)

In the research on natural language understanding that can perform multiple tasks and produce a model that provides general performance, various studies of multi-task learning techniques are being attempted. In addition, documents written in natural language typically contain time-related information, and accurate recognize such information is essential to understand the overall content and context of the document. In this paper, we propose a multi-task learning technique that incorporates a temporal relation extraction task into the learning process of NLU tasks to use the temporal contextual information of Korean input sentences. In order to reflect the characteristics of multi-task learning, a new task for extracting temporal relations is designed, and the model is configured to learn in conjunction with existing NLU tasks. In the experiment, the difference in performance was analyzed by learning the effect of various task combinations and the temporal relationships compared to the case where only the existing NLU task is used. Through the experimental results, we discuss that the overall performance of the multi-task combination is higher than that of individual tasks, especially when temporal relationship with the name entity recognition shows greatly improved performance.

2-Phase Passage Re-ranking Model based on Neural-Symbolic Ranking Models

Yongjin Bae, Hyun Kim, Joon-Ho Lim, Hyun-ki Kim, Kong Joo Lee

http://doi.org/10.5626/JOK.2021.48.5.501

Previous researches related to the QA system have focused on extracting exact answers for the given questions and passages. However, when expanding the problem from machine reading comprehension to open domain question answering, finding the passage containing the correct answer is as important as machine reading comprehension. DrQA reported that Exact Match@Top1 performance decreased from 69.5 to 27.1 when the QA system had the initial search step. In the present work, we have proposed the 2-phase passage reranking model to improve the performance of the question answering system. The proposed model integrates the results of the symbolic and neural ranking models to re-rank them again. The symbolic ranking model was trained based on the CatBoost algorithm and manual features between the question and passage. The neural model was trained based on the KorBERT model by fine-tuning. The second stage model was trained based on the neural regression model. We maximized the performance by combining ranking models with different characters. Finally, the proposed model showed the performance of 85.8% via MRR and 82.2% via BinaryRecall@Top1 measure while evaluating 1,000 questions. Each performance was improved by 17.3%(MRR) and 22.3%(BR@Top1) compared with the baseline model.

Metadata Extraction based on Deep Learning from Academic Paper in PDF

Seon-Wu Kim, Seon-Yeong Ji, Hee-Seok Jeong, Hwa-Mook Yoon, Sung-Pil Choi

http://doi.org/10.5626/JOK.2019.46.7.644

Recently, with a rapid increase in the number of academic documents, there has arisen a need for an academic database service to obtain information about the latest research trends. Although automated metadata extraction service for academic database construction has been studied, most of the academic texts are composed of PDF, which makes it difficult to automatically extract information. In this paper, we propose an automatic metadata extraction method for PDF documents. First, after transforming the PDF into XML format, the coordinates, size, width, and text feature in the XML markup token are extracted and constructed as a vector form. Extracted feature information is analyzed using Bidirectional GRU-CRF, which is an deep learning model specialized for sequence labeling, and finally, metadata are extracted. In this study, 10 kinds of journals among various domestic journals were selected and a training set for metadata extraction was constructed and experimented using the proposed methodology. As a result of extraction experiment on 9 kinds of metadata, 88.27% accuracy and 84.39% F1 performance was obtained.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Multi-task Learning Approach Based on Pre-trained Language Models Using Temporal Relations

2-Phase Passage Re-ranking Model based on Neural-Symbolic Ranking Models

Metadata Extraction based on Deep Learning from Academic Paper in PDF

Search

Editorial Office