Search : [ author: 이익훈 ] (2)

A Product Review Summarization Considering Additional Information

Jaeyeun Yoon, Ig-hoon Lee, Sang-goo Lee

http://doi.org/10.5626/JOK.2020.47.2.180

Automatic document summarization is a task that generates the document in a suitable form from an existing document for a certain user or occasion. As use of the Internet increases, the various data including texts are exploding and the value of document summarization technology is growing. While the latest deep learning-based models show reliable performance in document summarization, the problem is that performance depends on the quantity and quality of the training data. For example, it is difficult to generate reliable summarization with existing models from the product review text of online shopping malls because of typing errors and grammatically wrong sentences. Online malls and portal web services are struggling to solve this problem. Thus, to generate an appropriate document summary in poor condition relative to quality and quantity of the product review learning data, this study proposes a model that generates product review summaries with additional information. We found through experiments that this model showed improved performances in terms of relevance and readability than the existing model for product review summaries.

Korean Morphological Analyzer for Neologism and Spacing Error based on Sequence-to-Sequence

Byeongseo Choe, Ig-hoon Lee, Sang-goo Lee

http://doi.org/10.5626/JOK.2020.47.1.70

In order to analyze Internet text data from Korean internet communities, it is necessary to accurately perform morphological analysis even in a sentence with a spacing error and adequate restoration of original form for an out-of-vocabulary input. However, the existing Korean morphological analyzer often uses dictionaries and complicate preprocessing for the restoration. In this paper, we propose a Korean morphological analyzer model which is based on the sequence-to-sequence model. The model can effectively handle the spacing problem and OOV problem. In addition, the model uses syllable bigram and grapheme as additional input features. The proposed model does not use a dictionary and minimizes rule-based preprocessing. The proposed model showed better performance than other morphological analyzers without a dictionary in the experiment for Sejong corpus. Also, better performance was evident for the dataset without space and sample dataset collected from Internet.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr