Digital Library[ Search Result ]
English-Korean Neural Machine Translation using MASS with Relative Position Representation
Youngjun Jung, Cheoneum Park, Changki Lee, Junseok Kim
http://doi.org/10.5626/JOK.2020.47.11.1038
Neural Machine Translation has been mainly studied for a Sequence-to-Sequence model using supervised learning. However, since the supervised learning method shows low performance when the data is insufficient, recently, a transfer learning method of fine-tuning using the pre-training model based on a large amount of monolingual data such as BERT and MASS has been mainly studied in the field of natural language processing. In this paper, MASS using the pre-training method for language generation, was applied to the English-Korean machine translation. As a result of the experiment, the performance of the English-Korean machine translation model using MASS showed better performance than the existing models, and the performance of the machine translation model was further improved by applying the relative position representation method to MASS.
Korean End-to-end Neural Coreference Resolution with BERT
Kihun Kim, Cheonum Park, Changki Lee, Hyunki Kim
http://doi.org/10.5626/JOK.2020.47.10.942
Coreference resolution is a natural language task that identifies a mention that is a coreference resolution in a given document and finds and clusters the mention of the same entity. In the Korean coreference resolution, a method using the end-to-end model that simultaneously performs mention detection and mention clustering, and another method pointer network using the encoder-decoder model were used. The BERT model released by Google has been applied to natural language processing tasks and has demonstrated many performance improvements. In this paper, we propose a Korean end-to-end neural coreference resolution with BERT. This model uses the KorBERT pre-trained with the Korean data and applies dependency parsing results and the named entity recognition feature to reflect the structural and semantic characteristics of the Korean language. Experimental results show that the performance of the CoNLL F1 (DEV) 71.00% and (TEST) 69.01% in the ETRI Q & A domain data set was higher than the previous studies.
Korean Movie Review Sentiment Analysis using Self-Attention and Contextualized Embedding
Cheoneum Park, Dongheon Lee, Kihoon Kim, Changki Lee, Hyunki Kim
http://doi.org/10.5626/JOK.2019.46.9.901
Sentiment analysis is the processing task that involves collecting and classifying opinions about a specific object. However, it is difficult to grasp the subjectivity of a person using natural language, so the existing sentimental word dictionaries or probabilistic models cannot solve such a task, but the development of deep learning made it possible to solve the task. Self-attention is a method of modeling a given input sequence by calculating the attention weight of the input sequence itself and constructing a context vector with a weighted sum. In the context, a high weight is calculated between words with similar meanings. In this paper, we propose a method using a modeling network with self-attention and pre-trained contextualized embedding to solve the sentiment analysis task. The experimental result shows an accuracy of 89.82%.
Coreference Resolution using Multi-resolution Pointer Networks
Cheoneum Park, Changki Lee, Hyunki Kim
http://doi.org/10.5626/JOK.2019.46.4.334
Multi-resolution RNN is a method of modeling parallel sequences as RNNs. Coreference resolution is a natural language processing task in which several words representing different entities present in a document are defined as one cluster and can be solved by a pointer network. The encoder input sequence of the coreference resolution becomes all the morphemes of the document using the pointer network, and the decoder input sequence becomes all the nouns present in the document. In this paper, we propose three multi-resolution pointer network models that encode all morphemes and noun lists of a document in parallel and perform decoding by using both encoded hidden states in a decoder. We have solved the coreference resolution based on the proposed models. Experimental results show that Multi-resolution1 of the proposed model has 71.44% CoNLL F1, 70.52% CoNLL F1 of Multi-resolution2 and 70.59% CoNLL F1 of Multi-resolution3.
Korean Machine Reading Comprehension using S³-Net based on Position Encoding
Choeneum Park, Changki Lee, Hyunki Kim
http://doi.org/10.5626/JOK.2019.46.3.234
S³-Net is a deep learning model that is used in machine reading comprehension question answering (MRQA) based on Simple Recurrent Unit and Self-Matching Networks that calculates attention weight for own RNN sequence. The answers to the questions in the MRQA occur within the passage, because any passage is made up of several sentences, so the length of the input sequence becomes longer and the performance deteriorates. In this paper, a hierarchical model that adds sentence-level encoding and S³-Net that applies position encoding to check word order information to solve the problem of long-term context degradation are proposed. The experimental results show that the S³-Net model proposed in this paper has a performance of 69.43% in EM and 81.53% in F1 for single test, and 71.28% in EM and 82.67 in F1 for ensemble test.
Korean Machine Reading Comprehension with S²-Net
Cheoneum Park, Changki Lee, Sulyn Hong, Yigyu Hwang, Taejoon Yoo, Hyunki Kim
http://doi.org/10.5626/JOK.2018.45.12.1260
Machine reading comprehension is the task of understanding a given context and identifying the right answer in context. Simple recurrent unit (SRU) solves the vanishing gradient problem in recurrent neural network (RNN) by using neural gate such as gated recurrent unit (GRU), and removes previous hidden state from gate input to improve speed. Self-matching network is used in r-net, and this has a similar effect as coreference resolution can show similar semantic context information by calculating attention weight for its RNN sequence. In this paper, we propose a S²-Net model that add self-matching layer to an encoder using stacked SRUs and constructs a Korean machine reading comprehension dataset. Experimental results reveal the proposed S²-Net model has EM 70.81% and F1 82.48% performance in Korean machine reading comprehension.
Korean Dependency Parsing using Pointer Networks
http://doi.org/10.5626/JOK.2017.44.8.822
In this paper, we propose a Korean dependency parsing model using multi-task learning based pointer networks. Multi-task learning is a method that can be used to improve the performance by learning two or more problems at the same time. In this paper, we perform dependency parsing by using pointer networks based on this method and simultaneously obtaining the dependency relation and dependency label information of the words. We define five input criteria to perform pointer networks based on multi-task learning of morpheme in dependency parsing of a word. We apply a fine-tuning method to further improve the performance of the dependency parsing proposed in this paper. The results of our experiment show that the proposed model has better UAS 91.79% and LAS 89.48% than conventional Korean dependency parsing.
Mention Detection with Pointer Networks
http://doi.org/10.5626/JOK.2017.44.8.774
Mention detection systems use nouns or noun phrases as a head and construct a chunk of text that defines any meaning, including a modifier. The term “mention detection” relates to the extraction of mentions in a document. In the mentions, a coreference resolution pertains to finding out if various mentions have the same meaning to each other. A pointer network is a model based on a recurrent neural network (RNN) encoder-decoder, and outputs a list of elements that correspond to input sequence. In this paper, we propose the use of mention detection using pointer networks. Our proposed model can solve the problem of overlapped mention detection, an issue that could not be solved by sequence labeling when applying the pointer network to the mention detection. As a result of this experiment, performance of the proposed mention detection model showed an F1 of 80.07%, a 7.65%p higher than rule-based mention detection; a co-reference resolution performance using this mention detection model showed a CoNLL F1 of 52.67% (mention boundary), and a CoNLL F1 of 60.11% (head boundary) that is high, 7.68%p, or 1.5%p more than coreference resolution using rule-based mention detection.
Coreference Resolution for Korean Pronouns using Pointer Networks
Pointer Networks is a deep-learning model for the attention-mechanism outputting of a list of elements that corresponds to the input sequence and is based on a recurrent neural network (RNN). The coreference resolution for pronouns is the natural language processing (NLP) task that defines a single entity to find the antecedents that correspond to the pronouns in a document. In this paper, a pronoun coreference-resolution method that finds the relation between the antecedents and the pronouns using the Pointer Networks is proposed; furthermore, the input methods of the Pointer Networks-that is, the chaining order between the words in an entity-are proposed. From among the methods that are proposed in this paper, the chaining order Coref2 showed the best performance with an F1 of MUC 81.40 %. The method showed performances that are 31.00 % and 19.28 % better than the rule-based (50.40 %) and statistics-based (62.12 %) coreference resolution systems, respectively, for the Korean pronouns.
Korean Coreference Resolution using the Multi-pass Sieve
Cheon-Eum Park, Kyoung-Ho Choi, Changki Lee
Coreference resolution finds all expressions that refer to the same entity in a document. Coreference resolution is important for information extraction, document classification, document summary, and question answering system. In this paper, we adapt Stanford`s Multi-pass sieve system, the one of the best model of rule based coreference resolution to Korean. In this paper, all noun phrases are considered to mentions. Also, unlike Stanford`s Multi-pass sieve system, the dependency parse tree is used for mention extraction, a Korean acronym list is built ‘dynamically’. In addition, we propose a method that calculates weights by applying transitive properties of centers of the centering theory when refer Korean pronoun. The experiments show that our system obtains MUC 59.0%, B3 59.5%, Ceafe 63.5%, and CoNLL(Mean) 60.7%.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr