Journal of KIISE

Search : [ keyword: 형태소 분석 ] (14)

This paper introduces a non-autoregressive Korean morphological analyzer. The proposed morphological analyzer utilizes a transformer encoder to encode a given sentence and employs two non-autoregressive decoders for morphological analysis. Each decoder generates a morpheme sequence and a corresponding POS tag sequence, which are then combined to produce the final morphological analysis. Additionally, this paper leverages word segment information within the sentence to predict the target sequence length, mitigating performance degradation resulting from incorrect target sequence length predictions. Experimental results show that the proposed non-autoregressive Korean morphological analyzer outperforms all non-autoregressive baselines. It achieves comparable accuracy to an autoregressive Korean morphological analyzer while it performs nearly 14.76 times faster than the autoregressive Korean morphological analyzer.

A Deep Learning-based Two-Steps Pipeline Model for Korean Morphological Analysis and Part-of-Speech Tagging

Jun Young Youn, Jae Sung Lee

http://doi.org/10.5626/JOK.2021.48.4.444

Recent studies on Korean morphological analysis using artificial neural networks have usually performed morpheme segmentation and part-of-speech tagging as the first step with the restoration of the original form of morphemes by using a dictionary as the postprocessing step. In this study, we have divided the morphological analysis into two steps: the original form of a morpheme is restored first by using the sequence-to-sequence model, and then morpheme segmentation and part-of-speech tagging are performed by using BERT. Pipelining these two steps showed comparable performance to other approaches, even without using a morpheme restoring dictionary that requires rules or compound tag processing.

Joint Model of Morphological Analysis and Named Entity Recognition Using Shared Layer

Hongjin Kim, Seongsik Park, Harksoo Kim

http://doi.org/10.5626/JOK.2021.48.2.167

Named entity recognition is a natural language processing technology that finds words with unique meanings such as human names, place names, organization names, dates, and time in sentences and attaches them. Morphological analysis in Korean is generally divided into morphological analysis and part-of-speech tagging. In general, named entity recognition and morphological analysis studies conducted in independently. However, in this architecture, the error of morphological analysis propagates to named entity recognition. To alleviate the error propagation problem, we propose an integrated model using Label Attention Network (LAN). As a result of the experiment, our model shows better performance than the single model of named entity recognition and morphological analysis. Our model also demonstrates better performance than previous integration models.

Performance Analysis of Korean Morphological Analyzer based on Transformer and BERT

Yongseok Choi, Kong Joo Lee

http://doi.org/10.5626/JOK.2020.47.8.730

This paper introduces a Korean morphological analyzer using the Transformer, which is one of the most popular sequence-to-sequence deep neural models. The Transformer comprises an encoder and a decoder. The encoder compresses a raw input sentence into a fixed-size vector, while the decoder generates a morphological analysis result for the vector. We also replace the encoder with BERT, a pre-trained language representation model. An attention mechanism and a copying mechanism are integrated in the decoder. The processing units of the encoder and the decoder are eojeol-based WordPiece and morpheme-based WordPiece, respectively. Experimental results showed that the Transformer with fine-tuned BERT outperforms the randomly initialized Transformer by 2.9% in the F1 score. We also investigated the effects of the WordPiece embedding on morphological analysis when they are not fully updated in the training phases.

Unified Methodology of Multiple POS Taggers for Large-scale Korean Linguistic GS Set Construction

Tae-Young Kim, Pum-Mo Ryu, Hansaem Kim, Hyo-Jung Oh

http://doi.org/10.5626/JOK.2020.47.6.596

In recent years, there has been national support for constructing, sharing, and spreading a large-scale Korean linguistic GS set for Korean information processing. As part of the corpus construction project, this study proposes the methodology for constructing the Korean linguistic GS set using various Korean language analysis modules developed in Korea. To build a large-scale training set, we referred to automatic tagged candidate answers from the N-modules. We then minimized manual effort by classifying the error types from the candidate responses and semi- automatically correcting the major error types. In this study, we normalized results of the morphological analysis and constructed a large-scale Korean linguistic GS set based on the unified format U-POS. As a result of this study, 348,229 sentences, a total of 9,455,930 words, were constructed as the Korean linguistic GS set. This can be practically applied later as a basic training resource for Korean information processing.

Korean Morphological Analyzer for Neologism and Spacing Error based on Sequence-to-Sequence

Byeongseo Choe, Ig-hoon Lee, Sang-goo Lee

http://doi.org/10.5626/JOK.2020.47.1.70

In order to analyze Internet text data from Korean internet communities, it is necessary to accurately perform morphological analysis even in a sentence with a spacing error and adequate restoration of original form for an out-of-vocabulary input. However, the existing Korean morphological analyzer often uses dictionaries and complicate preprocessing for the restoration. In this paper, we propose a Korean morphological analyzer model which is based on the sequence-to-sequence model. The model can effectively handle the spacing problem and OOV problem. In addition, the model uses syllable bigram and grapheme as additional input features. The proposed model does not use a dictionary and minimizes rule-based preprocessing. The proposed model showed better performance than other morphological analyzers without a dictionary in the experiment for Sejong corpus. Also, better performance was evident for the dataset without space and sample dataset collected from Internet.

An Automatic Method of Generating a Large-Scale Train Set for Bi-LSTM based Sentiment Analysis

Min-Seong Choi, Byung-Won On

http://doi.org/10.5626/JOK.2019.46.8.800

Sentiment analysis using deep learning requires a large-scale train set labeled sentiment. However, direct labeling of sentiment by humans is time and cost-constrained, and it is not easy to collect the required data for sentiment analysis from many data. In the present work, to solve the existing problems, the existing sentiment lexicon was used to assign sentiment score, and when there was sentiment transformation element, the sentiment score was reset through dependency parsing and morphological analysis for automatic generation of large-scale train set labeled with the sentiment. The Top-k data with high sentiment score was extracted. Sentiment transformation elements include sentiment reversal, sentiment activation, and sentiment deactivation. Our experimental results reveal the generation of a large-scale train set in a shorter time than manual labeling and improvement in the performance of deep learning with an increase in the amount of train set. The accuracy of the model using only sentiment lexicon was 80.17% and the accuracy of the proposed model, which includes natural language processing technology was 89.17%. Overall, a 9% improvement was observed.

Sequence-to-sequence based Morphological Analysis and Part-Of-Speech Tagging for Korean Language with Convolutional Features

Jianri Li, EuiHyeon Lee, Jong-Hyeok Lee

http://doi.org/

Traditional Korean morphological analysis and POS tagging methods usually consist of two steps: 1 Generat hypotheses of all possible combinations of morphemes for given input, 2 Perform POS tagging search optimal result. require additional resource dictionaries and step could error to the step. In this paper, we tried to solve this problem end-to-end fashion using sequence-to-sequence model convolutional features. Experiment results Sejong corpus sour approach achieved 97.15% F1-score on morpheme level, 95.33% and 60.62% precision on word and sentence level, respectively; s96.91% F1-score on morpheme level, 95.40% and 60.62% precision on word and sentence level, respectively.

Probabilistic Segmentation and Tagging of Unknown Words

Bogyum Kim, Jae Sung Lee

http://doi.org/

Processing of unknown words such as proper nouns and newly coined words is important for a morphological analyzer to process documents in various domains. In this study, a segmentation and tagging method for unknown Korean words is proposed for the 3-step probabilistic morphological analysis. For guessing unknown word, it uses rich suffixes that are attached to open class words, such as general nouns and proper nouns. We propose a method to learn the suffix patterns from a morpheme tagged corpus, and calculate their probabilities for unknown open word segmentation and tagging in the probabilistic morphological analysis model. Results of the experiment showed that the performance of unknown word processing is greatly improved in the documents containing many unregistered words.

Syllable-based Korean POS Tagging Based on Combining a Pre-analyzed Dictionary with Machine Learning

Chung-Hee Lee, Joon-Ho Lim, Soojong Lim, Hyun-Ki Kim

http://doi.org/

This study is directed toward the design of a hybrid algorithm for syllable-based Korean POS tagging. Previous syllable-based works on Korean POS tagging have relied on a sequence labeling method and mostly used only a machine learning method. We present a new algorithm integrating a machine learning method and a pre-analyzed dictionary. We used a Sejong tagged corpus for training and evaluation. While the machine learning engine achieved eojeol precision of 0.964, the proposed hybrid engine achieved eojeol precision of 0.990. In a Quiz domain test, the machine learning engine and the proposed hybrid engine obtained 0.961 and 0.972, respectively. This result indicates our method to be effective for Korean POS tagging.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Non-autoregressive Korean Morphological Analysis with Word Segment Information

A Deep Learning-based Two-Steps Pipeline Model for Korean Morphological Analysis and Part-of-Speech Tagging

Joint Model of Morphological Analysis and Named Entity Recognition Using Shared Layer

Performance Analysis of Korean Morphological Analyzer based on Transformer and BERT

Unified Methodology of Multiple POS Taggers for Large-scale Korean Linguistic GS Set Construction

Korean Morphological Analyzer for Neologism and Spacing Error based on Sequence-to-Sequence

An Automatic Method of Generating a Large-Scale Train Set for Bi-LSTM based Sentiment Analysis

Sequence-to-sequence based Morphological Analysis and Part-Of-Speech Tagging for Korean Language with Convolutional Features

Probabilistic Segmentation and Tagging of Unknown Words

Syllable-based Korean POS Tagging Based on Combining a Pre-analyzed Dictionary with Machine Learning

Search

Editorial Office