Automatic Word Spacing Using Raw Corpus and a Morphological Analyzer 


Vol. 42,  No. 1, pp. 68-75, Jan.  2015


PDF

  Abstract

This paper proposes a method for the automatic word spacing of unsegmented Korean sentences. In our method, eojeol monograms are used for word spacing as opposed to the syllable n-grams that have been used in previous studies. The use of a Korean morphological analyzer is limited to the correction of typical word spacing errors. Our method gives a 98.06% syllable accuracy and a 94.15% eojeol recall, when 10-fold cross-validated with the Sejong corpus, after filtering out non-hangul eojeols. The processing rate is 250K eojeols or 1.8 MB per second on a typical personal computer. Syllable accuracy and eojeol recall are related to the size of the eojeol dictionary, better performance is expected with a bigger corpus.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

K. Shim, "Automatic Word Spacing Using Raw Corpus and a Morphological Analyzer," Journal of KIISE, JOK, vol. 42, no. 1, pp. 68-75, 2015. DOI: .


[ACM Style]

Kwangseob Shim. 2015. Automatic Word Spacing Using Raw Corpus and a Morphological Analyzer. Journal of KIISE, JOK, 42, 1, (2015), 68-75. DOI: .


[KCI Style]

심광섭, "말뭉치와 형태소 분석기를 활용한 한국어 자동 띄어쓰기," 한국정보과학회 논문지, 제42권, 제1호, 68~75쪽, 2015. DOI: .


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr