Search : [ author: 김진태 ] (2)

Document Summarization Using TextRank Based on Sentence Embedding

Seok-won Jeong, Jintae Kim, Harksoo Kim

http://doi.org/10.5626/JOK.2019.46.3.285

Document summarization is creating a short version document that maintains the main content of original document. An extractive summarization has been actively studied by the reason of it guarantees the basic level of grammar and high level of accuracy by copying a large amount of text from the original document. It is difficult to consider the meaning of sentences because the TextRank, which is a typical extractive summarization method, calculates an edge of graph through the frequency of words. In a bid to solve these drawbacks, we propose a new TextRank using sentence embedding. Through experiments, we confirmed that the proposed method can consider the meaning of the sentence better than the existing method.

Effective Generative Chatbot Model Trainable with a Small Dialogue Corpus

Jintae Kim, Hyeon-gu Lee, Harksoo Kim

http://doi.org/10.5626/JOK.2019.46.3.246

Contrary to popular retrieval-based chatbot models, generative chatbot models do not depend on predefined responses, but rather generate new responses based on well-trained neural networks. However, they require a large number of training corpus in the form of query-response pairs. If the training corpus are insufficient, they make grammatical errors emanating from out-of-vocabulary or sparse data problems, mostly in longer sentences. To overcome this challenge, we proposed a chatbot model based on sequence-to-sequence neural network using a mixture of words and syllables as encoding-decoding units. Moreover, we proposed a two-step training procedure involving pre-training using a large non-dialogue corpus and retraining using a smaller dialogue corpus. In the experiment involving small dialogue corpus (47,089 query-response pairs for training and 3,000 query-response pairs for evaluation), the proposed encoding-decoding units resulted to a reduction in out-of-vocabulary problem while the two-step training method led to improved performance measures like BLEU and ROUGE.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr