Digital Library[ Search Result ]
Research on Joint Models for Korean Word Spacing and POS (Part-Of-Speech) Tagging based on Bidirectional LSTM-CRF
http://doi.org/10.5626/JOK.2018.45.8.792
In general, Korean part-of-speech tagging is done on a sentence in which the spacing is completed by a word as an input. In order to process a sentence that is not properly spaced, automatic spacing is needed to correct the error. However, if the automatic spacing and the parts tagging are sequentially performed, a serious performance degradation may result from an error occurring at each step. In this study, we try to solve this problem by constructing an integrated model that can perform automatic spacing and POS(Part-Of-Speech) tagging simultaneously. Based on the Bidirectional LSTM-CRF model, we propose an integrated model that can simultaneously perform syllable-based word spacing and POS tagging complementarily. In the experiments using a Sejong tagged text, we obtained 98.77% POS tagging accuracy for the completely spaced sentences, and 97.92% morpheme accuracy for the sentences without any word spacing.
Automatic Word Spacing Using Raw Corpus and a Morphological Analyzer
This paper proposes a method for the automatic word spacing of unsegmented Korean sentences. In our method, eojeol monograms are used for word spacing as opposed to the syllable n-grams that have been used in previous studies. The use of a Korean morphological analyzer is limited to the correction of typical word spacing errors. Our method gives a 98.06% syllable accuracy and a 94.15% eojeol recall, when 10-fold cross-validated with the Sejong corpus, after filtering out non-hangul eojeols. The processing rate is 250K eojeols or 1.8 MB per second on a typical personal computer. Syllable accuracy and eojeol recall are related to the size of the eojeol dictionary, better performance is expected with a bigger corpus.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr