Korean Morphological Analyzer for Neologism and Spacing Error based on Sequence-to-Sequence 


Vol. 47,  No. 1, pp. 70-77, Jan.  2020
10.5626/JOK.2020.47.1.70


PDF

  Abstract

In order to analyze Internet text data from Korean internet communities, it is necessary to accurately perform morphological analysis even in a sentence with a spacing error and adequate restoration of original form for an out-of-vocabulary input. However, the existing Korean morphological analyzer often uses dictionaries and complicate preprocessing for the restoration. In this paper, we propose a Korean morphological analyzer model which is based on the sequence-to-sequence model. The model can effectively handle the spacing problem and OOV problem. In addition, the model uses syllable bigram and grapheme as additional input features. The proposed model does not use a dictionary and minimizes rule-based preprocessing. The proposed model showed better performance than other morphological analyzers without a dictionary in the experiment for Sejong corpus. Also, better performance was evident for the dataset without space and sample dataset collected from Internet.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

B. Choe, I. Lee, S. Lee, "Korean Morphological Analyzer for Neologism and Spacing Error based on Sequence-to-Sequence," Journal of KIISE, JOK, vol. 47, no. 1, pp. 70-77, 2020. DOI: 10.5626/JOK.2020.47.1.70.


[ACM Style]

Byeongseo Choe, Ig-hoon Lee, and Sang-goo Lee. 2020. Korean Morphological Analyzer for Neologism and Spacing Error based on Sequence-to-Sequence. Journal of KIISE, JOK, 47, 1, (2020), 70-77. DOI: 10.5626/JOK.2020.47.1.70.


[KCI Style]

최병서, 이익훈, 이상구, "신조어 및 띄어쓰기 오류에 강인한 시퀀스-투-시퀀스 기반 한국어 형태소 분석기," 한국정보과학회 논문지, 제47권, 제1호, 70~77쪽, 2020. DOI: 10.5626/JOK.2020.47.1.70.


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr