Search : [ author: 이현구 ] (8)

Passage Re-ranking Method Based on Sentence Similarity Through Multitask Learning

Youngjin Jang, Hyeon-gu Lee, Jihyun Wang, Chunghee Lee, Harksoo Kim

http://doi.org/10.5626/JOK.2020.47.4.416

The machine reading comprehension(MRC) system is a question answering system in which a computer understands a given passage and respond questions. Recently, with the development of the deep neural network, research on the machine reading system has been actively conducted, and the open domain machine reading system that identifies the correct answer from the results of the information retrieval(IR) model rather than the given passage is in progress. However, if the IR model fails to identify a passage comprising the correct answer, the MRC system cannot respond to the question. That is, the performance of the open domain MRC system depends on the performance of the IR model. Thus, for an open domain MRC system to record high performance, a high performance IR model must be preceded. The previous IR model has been studied through query expansion and reranking. In this paper, we propose a re-ranking method using deep neural networks. The proposed model re-ranks the retrieval results (passages) through multi-task learning-based sentence similarity, and improves the performance by approximately 8% compared to the performance of the existing IR model with experimental results of 58,980 pairs of MRC data.

Effective Generative Chatbot Model Trainable with a Small Dialogue Corpus

Jintae Kim, Hyeon-gu Lee, Harksoo Kim

http://doi.org/10.5626/JOK.2019.46.3.246

Contrary to popular retrieval-based chatbot models, generative chatbot models do not depend on predefined responses, but rather generate new responses based on well-trained neural networks. However, they require a large number of training corpus in the form of query-response pairs. If the training corpus are insufficient, they make grammatical errors emanating from out-of-vocabulary or sparse data problems, mostly in longer sentences. To overcome this challenge, we proposed a chatbot model based on sequence-to-sequence neural network using a mixture of words and syllables as encoding-decoding units. Moreover, we proposed a two-step training procedure involving pre-training using a large non-dialogue corpus and retraining using a smaller dialogue corpus. In the experiment involving small dialogue corpus (47,089 query-response pairs for training and 3,000 query-response pairs for evaluation), the proposed encoding-decoding units resulted to a reduction in out-of-vocabulary problem while the two-step training method led to improved performance measures like BLEU and ROUGE.

Korean Machine Reading Comprehension using Reinforcement Learning and Dual Co-Attention Mechanism

Hyeon-gu Lee, Harksoo Kim

http://doi.org/10.5626/JOK.2018.45.9.932

Machine Reading Comprehension is a question-answering model for the purposes of understanding a given document and then finding the correct answer within the document. Previous studies on the Machine Reading Comprehension model have been based on end-to-end neural network models with various attention mechanisms. However, in the previous models, difficulties arose when attempting to find answers with long dependencies between lexical clues because these models did not use grammatical and syntactic information. To resolve this problem, we propose a Machine Reading Comprehension model with a dual co-attention mechanism reflecting part-of-speech information and shortest dependency path information. In addition, to increase the performances, we propose a reinforce learning method using F1-scores of answer extraction as rewards. In the experiments with 18,863 question-answering pairs, the proposed model showed higher performances (exact match: 0.4566, F1-score: 0.7290) than the representative previous model.

Knowledge Embedding Method for Implementing a Generative Question-Answering Chat System

Sihyung Kim, Hyeon-gu Lee, Harksoo Kim

http://doi.org/10.5626/JOK.2018.45.2.134

A chat system is a computer program that understands user"s miscellaneous utterances and generates appropriate responses. Sometimes a chat system needs to answer users’ simple information-seeking questions. However, previous generative chat systems do not consider how to embed knowledge entities (i.e., subjects and objects in triple knowledge), essential elements for question-answering. The previous chat models have a disadvantage that they generate same responses although knowledge entities in users’ utterances are changed. To alleviate this problem, we propose a knowledge entity embedding method for improving question-answering accuracies of a generative chat system. The proposed method uses a Siamese recurrent neural network for embedding knowledge entities and their synonyms. For experiments, we implemented a sequence-to-sequence model in which subjects and predicates are encoded and objects are decoded. The proposed embedding method showed 12.48% higher accuracies than the conventional embedding method based on a convolutional neural network.

Title Generation Model for which Sequence-to-Sequence RNNs with Attention and Copying Mechanisms are used

Hyeon-gu Lee, Harksoo Kim

http://doi.org/10.5626/JOK.2017.44.7.674

In big-data environments wherein large amounts of text documents are produced daily, titles are very important clues that enable a prompt catching of the key ideas in documents; however, titles are absent for numerous document types such as blog articles and social-media messages. In this paper, a title-generation model for which sequence-to-sequence RNNs with attention and copying mechanisms are employed is proposed. For the proposed model, input sentences are encoded based on bi-directional GRU (gated recurrent unit) networks, and the title words are generated through a decoding of the encoded sentences with keywords that are automatically selected from the input sentences. Regarding the experiments with 93631 training-data documents and 500 test-data documents, the attention-mechanism performances are more effective (ROUGE-1: 0.1935, ROUGE-2: 0.0364, ROUGE-L: 0.1555) than those of the copying mechanism; in addition, the qualitative-evaluation radiative performance of the former is higher.

Answer Snippet Retrieval for Question Answering of Medical Documents

Hyeon-gu Lee, Minkyoung Kim, Harksoo Kim

http://doi.org/

With the explosive increase in the number of online medical documents, the demand for question-answering systems is increasing. Recently, question-answering models based on machine learning have shown high performances in various domains. However, many question-answering models within the medical domain are still based on information retrieval techniques because of sparseness of training data. Based on various information retrieval techniques, we propose an answer snippet retrieval model for question-answering systems of medical documents. The proposed model first searches candidate answer sentences from medical documents using a cluster-based retrieval technique. Then, it generates reliable answer snippets using a re-ranking model of the candidate answer sentences based on various sentence retrieval techniques. In the experiments with BioASQ 4b, the proposed model showed better performances (MAP of 0.0604) than the previous models.

One-Class Classification Model Based on Lexical Information and Syntactic Patterns

Hyeon-gu Lee, Maengsik Choi, Harksoo Kim

http://doi.org/

Relation extraction is an important information extraction technique that can be widely used in areas such as question-answering and knowledge population. Previous studies on relation extraction have been based on supervised machine learning models that need a large amount of training data manually annotated with relation categories. Recently, to reduce the manual annotation efforts for constructing training data, distant supervision methods have been proposed. However, these methods suffer from a drawback: it is difficult to use these methods for collecting negative training data that are necessary for resolving classification problems. To overcome this drawback, we propose a one-class classification model that can be trained without using negative data. The proposed model determines whether an input data item is included in an inner category by using a similarity measure based on lexical information and syntactic patterns in a vector space. In the experiments conducted in this study, the proposed model showed higher performance (an F1-score of 0.6509 and an accuracy of 0.6833) than a representative one-class classification model, one-class SVM(Support Vector Machine).

Data Consistency-Control Scheme Using a Rollback-Recovery Mechanism for Storage Class Memory

Hyun Ku Lee, Junghoon Kim, Dong Hyun Kang, Young Ik Eom

http://doi.org/

Storage Class Memory(SCM) has been considered as a next-generation storage device because it has positive advantages to be used both as a memory and storage. However, there are significant problems of data consistency in recently proposed file systems for SCM such as insufficient data consistency or excessive data consistency-control overhead. This paper proposes a novel data consistency-control scheme, which changes the write mode for log data depending on the modified data ratio in a block, using a rollback-recovery scheme instead of the Write Ahead Logging (WAL) scheme. The proposed scheme reduces the log data size and the synchronization cost for data consistency. In order to evaluate the proposed scheme, we implemented our scheme on a Linux 3.10.2- based system and measured its performance. The experimental results show that our scheme enhances the write throughput by 9 times on average when compared to the legacy data consistency control scheme.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr