Search : [ keyword: Transformer ] (28)

Analyzing the Impact of Sequential Context Learning on the Transformer Based Korean Text Summarization Model

Subin Kim, Yongjun Kim, Junseong Bang

http://doi.org/10.5626/JOK.2021.48.10.1097

Text summarization reduces the sequence length while maintaining the meaning of the entire article body, solving the problem of overloading information and helping readers consume information quickly. To this end, research on a Transformer-based English text summarization model has been actively conducted. Recently, an abstract text summary model reflecting the characteristics of English with a fixed word order by adding a Recurrent Neural Network (RNN)-based encoder was proposed. In this paper, we study the effect of sequential context learning on the text abstract summary model by using an RNN-based encoder for Korean, which has more free word order than English. Transformer-based model and a model that added RNN-based encoder to existing Transformer model are trained to compare the performance of headline generation and article body summary for the Korean articles collected directly. Experiments show that the model performs better when the RNN-based encoder is added, and that sequential contextual information learning is required for Korean abstractive text summarization.

SMERT: Single-stream Multimodal BERT for Sentiment Analysis and Emotion Detection

Kyeonghun Kim, Jinuk Park, Jieun Lee, Sanghyun Park

http://doi.org/10.5626/JOK.2021.48.10.1122

Sentiment Analysis is defined as a task that analyzes subjective opinion or propensity and, Emotion Detection is the task that finds emotions such as ‘happy’ or ‘sad’ from text data. Multimodal data refers to the appearance of image and voice data in addition to text data. In prior research, RNN or cross-transformer models were used, however, RNN models have long-term dependency problems. Also, since cross-transformer models could not capture the attribute of modalities, they got worse results. To solve those problems, we propose SMERT based on a single-stream transformer ran on a single network. SMERT can get joint representation for Sentiment Analysis and Emotion Detection. Besides, we use BERT tasks which are improved to utilize for multimodal data. To present the proposed model, we verify the superiority of SMERT through a comparative experiment on the combination of modalities using the CMU-MOSEI dataset and various evaluation metrics.

Number Normalization in Korean Using the Transformer Model

Jaeyoon Chun, Chansong Jo, Jeongpil Lee, Myoung-Wan Koo

http://doi.org/10.5626/JOK.2021.48.5.510

Text normalization is a significant component of text-to-speech (TTS) systems. Since numbers in Korean are read in various ways according to their context, number normalization in Korean is crucial to improving the quality of TTS systems. However, the existing model is based on ad hoc rules that are inappropriate for normalizing non-standard numbers. The purpose of this study was to propose a model of number normalization in Korean based on the sequence-to-sequence Transformer model. Moreover, number positional encoding was added to the model to handle long numbers. Overall, the proposed model achieved 98.80% f1 score in the normal test dataset and 90.1% in the non-standard test dataset, which were 2.52% and 19% higher, respectively, than the baseline model. In addition, the proposed model demonstrated a 13% improvement in the longer-number test dataset compared to the other deep learning models.

Korean Semantic Role Labeling with BERT

Jangseong Bae, Changki Lee, Soojong Lim, Hyunki Kim

http://doi.org/10.5626/JOK.2020.47.11.1021

Semantic role labeling is an application of natural language processing to identify relationships such as "who, what, how and why" with in a sentence. The semantic role labeling study mainly uses machine learning algorithms and the end-to-end method that excludes feature information. Recently, a language model called BERT (Bidirectional Encoder Representations from Transformers) has emerged in the natural language processing field, performing better than the state-of- the-art models in the natural language processing field. The performance of the semantic role labeling study using the end-to-end method is mainly influenced by the structure of the machine learning model or the pre-trained language model. Thus, in this paper, we apply BERT to the Korean semantic role labeling to improve the Korean semantic role labeling performance. As a result, the performance of the Korean semantic role labeling model using BERT is 85.77%, which is better than the existing Korean semantic role labeling model.

LEXAI : Legal Document Similarity Analysis Service using Explainable AI

Juho Bai, Seog Park

http://doi.org/10.5626/JOK.2020.47.11.1061

Recently, in keeping with the improvement of deep learning, studies on using deep learning a specialized field have diversified. Semantic searching for legal documents is an essential part of the legal field. However, it is difficult to function outside of the service using the expert system because it requires professional knowledge in the relevant field. It is also challenging to establish an automated, semantically similar legal document retrieval environment because the cost of hiring professional human resources is high. While existing retrieval services provide an environment based on expert systems and statistical systems, the proposed method adopts the deep learning method with a classification task. We propose a database system structure that provides searching for legal documents with high semantic similarity using an explainable neural network. The features of these proposed methods show the performance of developing and verifying visual similarity assessment methods for semantic relevance among similar documents.

Semantic Similarity-based Intent Analysis using Pre-trained Transformer for Natural Language Understanding

Sangkeun Jung, Hyein Seo, Hyunji Kim, Taewook Hwang

http://doi.org/10.5626/JOK.2020.47.8.748

Natural language understanding (NLU) is a central technique applied to developing robot, smart messenger, and natural interface. In this study, we propose a novel similarity-based intent analysis method instead of the typical classification methods for intent analysis problems in the NLU. To accomplish this, the neural network-based text and semantic frame readers are introduced to learn semantic vectors using pairwise text-semantic frame instances. The text to vector and the semantic frame to vector projection methods using the pre-trained transformer are proposed. Then, we propose a method of attaching the intention tag of the nearest training sentence to the query sentence by measuring the semantic vector distances in the vector space. Four experiments on the natural language learning suggest that the proposed method demonstrates superior performance compared to the existing intention analysis techniques. These four experiments use natural language corpora in Korean and English. The two experiments in Korean are weather and navigation language corpora, and the two English-based experiments involve air travel information systems and voice platform language corpora.

Performance Analysis of Korean Morphological Analyzer based on Transformer and BERT

Yongseok Choi, Kong Joo Lee

http://doi.org/10.5626/JOK.2020.47.8.730

This paper introduces a Korean morphological analyzer using the Transformer, which is one of the most popular sequence-to-sequence deep neural models. The Transformer comprises an encoder and a decoder. The encoder compresses a raw input sentence into a fixed-size vector, while the decoder generates a morphological analysis result for the vector. We also replace the encoder with BERT, a pre-trained language representation model. An attention mechanism and a copying mechanism are integrated in the decoder. The processing units of the encoder and the decoder are eojeol-based WordPiece and morpheme-based WordPiece, respectively. Experimental results showed that the Transformer with fine-tuned BERT outperforms the randomly initialized Transformer by 2.9% in the F1 score. We also investigated the effects of the WordPiece embedding on morphological analysis when they are not fully updated in the training phases.

Analysis of the Semantic Answer Types to Understand the Limitations of MRQA Models

Doyeon Lim, Haritz Puerto San Roman, Sung-Hyon Myaeng

http://doi.org/10.5626/JOK.2020.47.3.298

Recently, the performance of Machine Reading Question Answering (MRQA) models has surpassed humans on datasets such as SQuAD. For further advances in MRQA techniques, new datasets are being introduced. However, they are rarely based on a deep understanding of the QA capabilities of the existing models tested on the previous datasets. In this study, we analyze the SQuAD dataset quantitatively and qualitatively to demonstrate how the MRQA models answer the questions. It turns out that the current MRQA models rely heavily on the use of wh-words and Lexical Answer Types (LAT) in the questions instead of using the meanings of the entire questions and the evidence documents. Based on this analysis, we present the directions for new datasets so that they can facilitate the advancement of current QA techniques centered around the MRQA models.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr