Search : [ author: 이재성 ] (3)

A Deep Learning-based Two-Steps Pipeline Model for Korean Morphological Analysis and Part-of-Speech Tagging

Jun Young Youn, Jae Sung Lee

http://doi.org/10.5626/JOK.2021.48.4.444

Recent studies on Korean morphological analysis using artificial neural networks have usually performed morpheme segmentation and part-of-speech tagging as the first step with the restoration of the original form of morphemes by using a dictionary as the postprocessing step. In this study, we have divided the morphological analysis into two steps: the original form of a morpheme is restored first by using the sequence-to-sequence model, and then morpheme segmentation and part-of-speech tagging are performed by using BERT. Pipelining these two steps showed comparable performance to other approaches, even without using a morpheme restoring dictionary that requires rules or compound tag processing.

Morpheme-based Korean Word Vector Generation Considering the Subword and Part-Of-Speech Information

Junyoung Youn, Jae Sung Lee

http://doi.org/10.5626/JOK.2020.47.4.395

Word vectors enable finding the relationship between words by vector computation. They are also widely used as pre-trained data for high-level neural network programs. Various modified models from English models have been proposed for the generation of Korean word vectors, with various segmentation units such as Eojeol(word phrase), morpheme, syllable and Jaso. In this study, we propose Korean word vector generation methods that segment Eojeol into morphemes and convert them into subwords comprising either syllable or Jaso. We also propose methods using Part-Of-Speech tags provided in the pre-processing to reflect semantic and syntactic information regarding the morphemes. Intrinsic and extrinsic experiments showed that the method using morpheme segments with Jaso subwords and additional Part-Of-Speech tags showed better performance than others under the condition that the target data are normal text and not as grammatically incorrect.

Probabilistic Segmentation and Tagging of Unknown Words

Bogyum Kim, Jae Sung Lee

http://doi.org/

Processing of unknown words such as proper nouns and newly coined words is important for a morphological analyzer to process documents in various domains. In this study, a segmentation and tagging method for unknown Korean words is proposed for the 3-step probabilistic morphological analysis. For guessing unknown word, it uses rich suffixes that are attached to open class words, such as general nouns and proper nouns. We propose a method to learn the suffix patterns from a morpheme tagged corpus, and calculate their probabilities for unknown open word segmentation and tagging in the probabilistic morphological analysis model. Results of the experiment showed that the performance of unknown word processing is greatly improved in the documents containing many unregistered words.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr