Search : [ author: Hyunki Kim ] (9)

Korean Semantic Role Labeling with BERT

Jangseong Bae, Changki Lee, Soojong Lim, Hyunki Kim

http://doi.org/10.5626/JOK.2020.47.11.1021

Semantic role labeling is an application of natural language processing to identify relationships such as "who, what, how and why" with in a sentence. The semantic role labeling study mainly uses machine learning algorithms and the end-to-end method that excludes feature information. Recently, a language model called BERT (Bidirectional Encoder Representations from Transformers) has emerged in the natural language processing field, performing better than the state-of- the-art models in the natural language processing field. The performance of the semantic role labeling study using the end-to-end method is mainly influenced by the structure of the machine learning model or the pre-trained language model. Thus, in this paper, we apply BERT to the Korean semantic role labeling to improve the Korean semantic role labeling performance. As a result, the performance of the Korean semantic role labeling model using BERT is 85.77%, which is better than the existing Korean semantic role labeling model.

Korean End-to-end Neural Coreference Resolution with BERT

Kihun Kim, Cheonum Park, Changki Lee, Hyunki Kim

http://doi.org/10.5626/JOK.2020.47.10.942

Coreference resolution is a natural language task that identifies a mention that is a coreference resolution in a given document and finds and clusters the mention of the same entity. In the Korean coreference resolution, a method using the end-to-end model that simultaneously performs mention detection and mention clustering, and another method pointer network using the encoder-decoder model were used. The BERT model released by Google has been applied to natural language processing tasks and has demonstrated many performance improvements. In this paper, we propose a Korean end-to-end neural coreference resolution with BERT. This model uses the KorBERT pre-trained with the Korean data and applies dependency parsing results and the named entity recognition feature to reflect the structural and semantic characteristics of the Korean language. Experimental results show that the performance of the CoNLL F1 (DEV) 71.00% and (TEST) 69.01% in the ETRI Q & A domain data set was higher than the previous studies.

Korean Movie Review Sentiment Analysis using Self-Attention and Contextualized Embedding

Cheoneum Park, Dongheon Lee, Kihoon Kim, Changki Lee, Hyunki Kim

http://doi.org/10.5626/JOK.2019.46.9.901

Sentiment analysis is the processing task that involves collecting and classifying opinions about a specific object. However, it is difficult to grasp the subjectivity of a person using natural language, so the existing sentimental word dictionaries or probabilistic models cannot solve such a task, but the development of deep learning made it possible to solve the task. Self-attention is a method of modeling a given input sequence by calculating the attention weight of the input sequence itself and constructing a context vector with a weighted sum. In the context, a high weight is calculated between words with similar meanings. In this paper, we propose a method using a modeling network with self-attention and pre-trained contextualized embedding to solve the sentiment analysis task. The experimental result shows an accuracy of 89.82%.

Coreference Resolution using Multi-resolution Pointer Networks

Cheoneum Park, Changki Lee, Hyunki Kim

http://doi.org/10.5626/JOK.2019.46.4.334

Multi-resolution RNN is a method of modeling parallel sequences as RNNs. Coreference resolution is a natural language processing task in which several words representing different entities present in a document are defined as one cluster and can be solved by a pointer network. The encoder input sequence of the coreference resolution becomes all the morphemes of the document using the pointer network, and the decoder input sequence becomes all the nouns present in the document. In this paper, we propose three multi-resolution pointer network models that encode all morphemes and noun lists of a document in parallel and perform decoding by using both encoded hidden states in a decoder. We have solved the coreference resolution based on the proposed models. Experimental results show that Multi-resolution1 of the proposed model has 71.44% CoNLL F1, 70.52% CoNLL F1 of Multi-resolution2 and 70.59% CoNLL F1 of Multi-resolution3.

Korean Machine Reading Comprehension using S³-Net based on Position Encoding

Choeneum Park, Changki Lee, Hyunki Kim

http://doi.org/10.5626/JOK.2019.46.3.234

S³-Net is a deep learning model that is used in machine reading comprehension question answering (MRQA) based on Simple Recurrent Unit and Self-Matching Networks that calculates attention weight for own RNN sequence. The answers to the questions in the MRQA occur within the passage, because any passage is made up of several sentences, so the length of the input sequence becomes longer and the performance deteriorates. In this paper, a hierarchical model that adds sentence-level encoding and S³-Net that applies position encoding to check word order information to solve the problem of long-term context degradation are proposed. The experimental results show that the S³-Net model proposed in this paper has a performance of 69.43% in EM and 81.53% in F1 for single test, and 71.28% in EM and 82.67 in F1 for ensemble test.

Korean Machine Reading Comprehension with S²-Net

Cheoneum Park, Changki Lee, Sulyn Hong, Yigyu Hwang, Taejoon Yoo, Hyunki Kim

http://doi.org/10.5626/JOK.2018.45.12.1260

Machine reading comprehension is the task of understanding a given context and identifying the right answer in context. Simple recurrent unit (SRU) solves the vanishing gradient problem in recurrent neural network (RNN) by using neural gate such as gated recurrent unit (GRU), and removes previous hidden state from gate input to improve speed. Self-matching network is used in r-net, and this has a similar effect as coreference resolution can show similar semantic context information by calculating attention weight for its RNN sequence. In this paper, we propose a S²-Net model that add self-matching layer to an encoder using stacked SRUs and constructs a Korean machine reading comprehension dataset. Experimental results reveal the proposed S²-Net model has EM 70.81% and F1 82.48% performance in Korean machine reading comprehension.

Korean Semantic Role Labeling Using Domain Adaptation Technique

Soojong Lim, Yongjin Bae, Hyunki Kim, Dongyul Ra

http://doi.org/

Developing a high-performance Semantic Role Labeling (SRL) system for a domain requires manually annotated training data of large size in the same domain. However, such SRL training data of sufficient size is available only for a few domains. Performances of Korean SRL are degraded by almost 15% or more, when it is directly applied to another domain with relatively small training data. This paper proposes two techniques to minimize performance degradation in the domain transfer. First, a domain adaptation algorithm for Korean SRL is proposed which is based on the prior model that is one of domain adaptation paradigms. Secondly, we proposed to use simplified features related to morphological and syntactic tags, when using small-sized target domain data to suppress the problem of data sparseness. Other domain adaptation techniques were experimentally compared to our techniques in this paper, where news and Wikipedia were used as the sources and target domains, respectively. It was observed that the highest performance is achieved when our two techniques were applied together. In our system"s performance, F1 score of 64.3% was considered to be 2.4~3.1% higher than the methods from other research.

Competition Relation Extraction based on Combining Machine Learning and Filtering

ChungHee Lee, YoungHoon Seo, HyunKi Kim

http://doi.org/

This study was directed at the design of a hybrid algorithm for competition relation extraction. Previous works on relation extraction have relied on various lexical and deep parsing indicators and mostly utilize only the machine learning method. We present a new algorithm integrating machine learning with various filtering methods. Some simple but useful features for competition relation extraction are also introduced, and an optimum feature set is proposed. The goal of this paper was to increase the precision of competition relation extraction by combining supervised learning with various filtering methods. Filtering methods were employed for classifying compete relation occurrence, using distance restriction for the filtering of feature pairs, and classifying whether or not the candidate entity pair is spam. For evaluation, a test set consisting of 2,565 sentences was examined. The proposed method was compared with the rule-based method and general relation extraction method. As a result, the rule-based method achieved positive precision of 0.812 and accuracy of 0.568, while the general relation extraction method achieved 0.612 and 0.563, respectively. The proposed system obtained positive precision of 0.922 and accuracy of 0.713. These results demonstrate that the developed method is effective for competition relation extraction.

Korean Semantic Role Labeling Using Structured SVM

Changki Lee, Soojong Lim, Hyunki Kim

http://doi.org/

Semantic role labeling (SRL) systems determine the semantic role labels of the arguments of predicates in natural language text. An SRL system usually needs to perform four tasks in sequence: Predicate Identification (PI), Predicate Classification (PC), Argument Identification (AI), and Argument Classification (AC). In this paper, we use the Korean Propbank to develop our Korean semantic role labeling system. We describe our Korean semantic role labeling system that uses sequence labeling with structured Support Vector Machine (SVM). The results of our experiments on the Korean Propbank dataset reveal that our method obtains a 97.13% F1 score on Predicate Identification and Classification (PIC), and a 76.96% F1 score on Argument Identification and Classification (AIC).


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr