Digital Library[ Search Result ]
Korean Coreference Resolution through BERT Embedding at the Morpheme Level
Kyeongbin Jo, Yohan Choi, Changki Lee, Jihee Ryu, Joonho Lim
http://doi.org/10.5626/JOK.2023.50.6.495
Coreference resolution is a natural language processing task that identifies mentions that are subject to coreference resolution in a given document, and finds and groups the mentions that refer to the same entity. Korean coreference resolution has been mainly studied in an end-to-end method, and for this purpose, all spans must be considered as potential mentions, so memory usage and time complexity increase. In this paper, a word-level coreference resolution model that performs coreference resolution by mapping sub-tokens back to word units was applied to Korean, and the token expression of the word-level coreference resolution model is calculated through CorefBERT to reflect Korean characteristics. After that, entity name and dependency parsing features were added. As a result of the experiment, in the ETRI Q&A domain evaluation set, F1 was 70.68%, showing a 1.67% performance improvement compared to the existing end-to-end cross-reference solving model, Memory usage improved by 2.4 times, and speed increased by 1.82 times.
Document-level Machine Translation Data Augmentation Using a Cluster Algorithm and NSP
http://doi.org/10.5626/JOK.2023.50.5.401
In recent years, research on document level machine translation has been actively conducted to understand the context of the entire document and perform natural translation. Similar to the sentence-level machine translation model, a large amount of training data is required for training of the document-level machine translation model, but there is great difficulty in building a large amount of document-level parallel corpus. Therefore, in this paper, we propose a data augmentation technique effective for document-level machine translation in order to improve the lack of parallel corpus per document. As a result of the experiment, by applying the data augmentation technique using the cluster algorithm and NSP to the sentence unit parallel corpus without context, the performance of the document-level machine translation is improved by S-BLEU 3.0 and D-BLEU 2.7 compared to that before application of the data augmentation technique.
Style Transfer for Chat Language using Unsupervised Machine Translation
Youngjun Jung, Changki Lee, Jeongin Hwang, Hyungjong Noh
http://doi.org/10.5626/JOK.2023.50.1.19
Style transfer is the task of generating text of a target style while maintaining content of given text written in a source style. In general, it is assumed that the content is an invariant and the style is variable when the style of the text is transferred. However, in the case of chat language, there is a problem in that it is not well trained by existing style transfer model. In this paper, we proposed a method of transfer chat language into written language using a style transfer model with unsupervised machine translation. This study shows that it is possible to construct a word transfer dictionary between styles that can be used for style transfer by utilizing transferred results. Additionally, it shows that transferred results can be improved by applying a filtering method to transferred result pair so that only well transferred results can be used and by training the style transfer model using a supervised learning method with filtered results.
Korean End-to-End Coreference Resolution with BERT for Long Document
Kyeongbin Jo, Youngjun Jung, Changki Lee, Jihee Ryu, Joonho Lim
http://doi.org/10.5626/JOK.2023.50.1.32
Coreference resolution is a natural language processing task that identifies mentions that are coreference resolution targets, identifies mentions that refer to the same entity, and groups them together. Recently, in coreference resolution, an end-to-end model using BERT to derive the context expression of a word while simultaneously performing mention detection and coreference resolution has been mainly studied. However, BERT has the problem of reduced performance for long documents due to its input length limit. Therefore, in this paper, the following model is proposed. First, a lengthy document is split into tokens of 512 or fewer tokens, extracted from an existing local BERT to obtain the primary contextual expression of a word, and then recombined to compute and add a globalpositional embedding value for the original document. Finally, a coreference resolution was performed by computing the entire context expression with the Global BERT layer. As a result of the experiment, the model proposed in this paper showed similar performance to the existing model, while the GPU memory usage decreased by 1.4 times and the speed improved by 2.1 times.
Korean Text Summarization using MASS with Copying and Coverage Mechanism and Length Embedding
Youngjun Jung, Changki Lee, Wooyoung Go, Hanjun Yoon
http://doi.org/10.5626/JOK.2022.49.1.25
Text summarization is a technology that generates a summary including important and essential information from a given document, and an end-to-end abstractive summarization model using a sequence-to-sequence model is mainly studied. Recently, a transfer learning method that performs fine-tuning using a pre-training model based on large-scale monolingual data has been actively studied in the field of natural language processing. In this paper, we applied the copying mechanism method to the MASS model, conducted pre-training for Korean language generation, and then applied it to Korean text summarization. In addition, coverage mechanism and length embedding were additionally applied to improve the summarization model. As a result of the experiment, it was shown that the Korean text summarization model, which applied the copying and coverage mechanism method to the MASS model, showed a higher performance than the existing models, and that the length of the summary could be adjusted through length embedding.
English-Korean Neural Machine Translation using MASS with Relative Position Representation
Youngjun Jung, Cheoneum Park, Changki Lee, Junseok Kim
http://doi.org/10.5626/JOK.2020.47.11.1038
Neural Machine Translation has been mainly studied for a Sequence-to-Sequence model using supervised learning. However, since the supervised learning method shows low performance when the data is insufficient, recently, a transfer learning method of fine-tuning using the pre-training model based on a large amount of monolingual data such as BERT and MASS has been mainly studied in the field of natural language processing. In this paper, MASS using the pre-training method for language generation, was applied to the English-Korean machine translation. As a result of the experiment, the performance of the English-Korean machine translation model using MASS showed better performance than the existing models, and the performance of the machine translation model was further improved by applying the relative position representation method to MASS.
Korean Semantic Role Labeling with BERT
Jangseong Bae, Changki Lee, Soojong Lim, Hyunki Kim
http://doi.org/10.5626/JOK.2020.47.11.1021
Semantic role labeling is an application of natural language processing to identify relationships such as "who, what, how and why" with in a sentence. The semantic role labeling study mainly uses machine learning algorithms and the end-to-end method that excludes feature information. Recently, a language model called BERT (Bidirectional Encoder Representations from Transformers) has emerged in the natural language processing field, performing better than the state-of- the-art models in the natural language processing field. The performance of the semantic role labeling study using the end-to-end method is mainly influenced by the structure of the machine learning model or the pre-trained language model. Thus, in this paper, we apply BERT to the Korean semantic role labeling to improve the Korean semantic role labeling performance. As a result, the performance of the Korean semantic role labeling model using BERT is 85.77%, which is better than the existing Korean semantic role labeling model.
Korean End-to-end Neural Coreference Resolution with BERT
Kihun Kim, Cheonum Park, Changki Lee, Hyunki Kim
http://doi.org/10.5626/JOK.2020.47.10.942
Coreference resolution is a natural language task that identifies a mention that is a coreference resolution in a given document and finds and clusters the mention of the same entity. In the Korean coreference resolution, a method using the end-to-end model that simultaneously performs mention detection and mention clustering, and another method pointer network using the encoder-decoder model were used. The BERT model released by Google has been applied to natural language processing tasks and has demonstrated many performance improvements. In this paper, we propose a Korean end-to-end neural coreference resolution with BERT. This model uses the KorBERT pre-trained with the Korean data and applies dependency parsing results and the named entity recognition feature to reflect the structural and semantic characteristics of the Korean language. Experimental results show that the performance of the CoNLL F1 (DEV) 71.00% and (TEST) 69.01% in the ETRI Q & A domain data set was higher than the previous studies.
Korean Text Summarization using MASS with Relative Position Representation
Youngjun Jung, Hyunsun Hwang, Changki Lee
http://doi.org/10.5626/JOK.2020.47.9.873
In the language generation task, deep learning-based models that generate natural languages using a Sequence-to-Sequence model are actively being studied. In the field of text summarization, wherein the method of extracting only the core sentences from the text is used, an abstract summarization study is underway. Recently, a transfer learning method of fine-tuning using pre-training model based on large amount of monolingual data such as BERT and MASS has been mainly studied in the field of natural language processing. In this paper, after pre-training for the Korean language generation using MASS, it was applied to the summarization of the Korean text. As a result of the experiment, the Korean text summarization model using MASS was higher performance than the existing models. Additionally, the performance of the text summarization model was improved by applying the relative position representation method to MASS.
Korean Movie Review Sentiment Analysis using Self-Attention and Contextualized Embedding
Cheoneum Park, Dongheon Lee, Kihoon Kim, Changki Lee, Hyunki Kim
http://doi.org/10.5626/JOK.2019.46.9.901
Sentiment analysis is the processing task that involves collecting and classifying opinions about a specific object. However, it is difficult to grasp the subjectivity of a person using natural language, so the existing sentimental word dictionaries or probabilistic models cannot solve such a task, but the development of deep learning made it possible to solve the task. Self-attention is a method of modeling a given input sequence by calculating the attention weight of the input sequence itself and constructing a context vector with a weighted sum. In the context, a high weight is calculated between words with similar meanings. In this paper, we propose a method using a modeling network with self-attention and pre-trained contextualized embedding to solve the sentiment analysis task. The experimental result shows an accuracy of 89.82%.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr