Probabilistic Segmentation and Tagging of Unknown Words 


Vol. 43,  No. 4, pp. 430-436, Apr.  2016


PDF

  Abstract

Processing of unknown words such as proper nouns and newly coined words is important for a morphological analyzer to process documents in various domains. In this study, a segmentation and tagging method for unknown Korean words is proposed for the 3-step probabilistic morphological analysis. For guessing unknown word, it uses rich suffixes that are attached to open class words, such as general nouns and proper nouns. We propose a method to learn the suffix patterns from a morpheme tagged corpus, and calculate their probabilities for unknown open word segmentation and tagging in the probabilistic morphological analysis model. Results of the experiment showed that the performance of unknown word processing is greatly improved in the documents containing many unregistered words.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

B. Kim and J. S. Lee, "Probabilistic Segmentation and Tagging of Unknown Words," Journal of KIISE, JOK, vol. 43, no. 4, pp. 430-436, 2016. DOI: .


[ACM Style]

Bogyum Kim and Jae Sung Lee. 2016. Probabilistic Segmentation and Tagging of Unknown Words. Journal of KIISE, JOK, 43, 4, (2016), 430-436. DOI: .


[KCI Style]

김보겸, 이재성, "확률 기반 미등록 단어 분리 및 태깅," 한국정보과학회 논문지, 제43권, 제4호, 430~436쪽, 2016. DOI: .


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr