Search : [ author: Hyopil Shin ] (6)

Contract Eligibility Verification Enhanced by Keyword and Contextual Embeddings

Sangah Lee, Seokgi Kim, Eunjin Kim, Minji Kang, Hyopil Shin

http://doi.org/10.5626/JOK.2022.49.10.848

Contracts need to be reviewed to be verified if they include all the essential clauses for them to be valid. Such clauses are highly formal and repetitive regardless of the kinds of contracts, and automated legal technologies are required for legal text comprehension. In this paper, we have constructed a simple item-by-item classification model for clauses in contracts to estimate contract eligibility by addressing formal and repetitive properties of contract clauses. We have used keyword embeddings based on conventional requirements of contracts and concatenate them to sentence embeddings of clauses, extracted from a BERT model fine-tuned with legal documents. The contract eligibility can be verified by the predicted labels. Based on our methods, we report reasonable performances with the accuracy of 90.57 and 90.64, and an F1-score of 93.27 and 93.26, using additional keyword embeddings with BERT embeddings.

Combining Sentiment-Combined Model with Pre-Trained BERT Models for Sentiment Analysis

Sangah Lee, Hyopil Shin

http://doi.org/10.5626/JOK.2021.48.7.815

It is known that BERT can capture various linguistic knowledge from raw text via language modeling without using any additional hand-crafted features. However, some studies have shown that BERT-based models with an additional use of specific language knowledge have higher performance for natural language processing problems associated with that knowledge. Based on such finding, we trained a sentiment-combined model by adding sentiment features to the BERT structure. We constructed sentiment feature embeddings using sentiment polarity and intensity values annotated in a Korean sentiment lexicon and proposed two methods (external fusing and knowledge distillation) to combine sentiment-combined model with a general-purpose BERT pre-trained model. The external fusing method resulted in higher performances in Korean sentiment analysis tasks with movie reviews and hate speech datasets than baselines from other pre-trained models not fused with sentiment-combined models. We also observed that adding sentiment features to the BERT structure improved the model’s language modeling and sentiment analysis performance. Furthermore, when implementing sentiment-combined models, training time and cost could be decreased by using a small-scale BERT model with a small number of layers, dimensions, and steps.

A Small-Scale Korean-Specific BERT Language Model

Sangah Lee, Hansol Jang, Yunmee Baik, Suzi Park, Hyopil Shin

http://doi.org/10.5626/JOK.2020.47.7.682

Recent models for the sentence embedding use huge corpus and parameters. They have massive data and large hardware and it incurs extensive time to pre-train. This tendency raises the need for a model with comparable performance while economically using training data. In this study, we proposed a Korean-specific model KR-BERT, using sub-character level to character-level Korean dictionaries and BidirectionalWordPiece Tokenizer. As a result, our KR-BERT model performs comparably and even better than other existing pre-trained models using one-tenth the size of training data from the existing models. It demonstrates that in a morphologically complex and resourceless language, using sub-character level and BidirectionalWordPiece Tokenizer captures language-specific linguistic phenomena that the Multilingual BERT model missed.

An Analysis of Linear Argumentation Structure of Korean Debate Texts Using Sequential Modeling and Linguistic Features

Sangah Lee, Hyopil Shin

http://doi.org/10.5626/JOK.2018.45.12.1292

Current studies on argument mining provide tree-structured argumentation structures based on relational nuclearities and discourse relations between sentences in each document. In this case, inconsistencies between related sentences may occur, constructing a full argumentation structure for a document by the bottom-up method. This paper introduces relations between the topic of texts and sentences to provide a frame of argumentation structure. Automatic analysis of argumentation structure uses contextual information from documents, as argument types defined for each sentence are applied to the sequential model. In this paper, we vectorized sentences using bag-of-words of morphemes, word embedding of morphemes, and some linguistic features extracted from the sentence respectively, and used those vectors as inputs of models to predict argument types in the document. As a result, the combination of linguistic features and the sequential model revealed the best result in the experiment, showing 0.68 as the f1-score.

Measuring Semantic Orientation of Words using Temporal Difference Learning

Youngsam Kim, Hyopil Shin

http://doi.org/10.5626/JOK.2018.45.12.1287

Temporal-difference(TD) learning is a core algorithm of reinforcement learning, which employs models of Markov process. In the TD methods, rewards are always discounted by a discount factor and states receive these discounted values as their rewards. In this paper, we attempted to estimate a semantic orientation of words in texts using the TD-based methods and examined the effectiveness of the proposed methods by comparing them to existing feature selection methods (indirect approach) and Bayes probabilities (direct approach). The TD-based estimation would be useful for tasks of social opinion mining, since TD learning is inherently an on-line method. In order to show our approach is scalable to huge data, the estimation method is also evaluated using asynchronous parallel processing.

Automatic Product Review Helpfulness Estimation based on Review Information Types

Munhyong Kim, Hyopil Shin

http://doi.org/

Many available online product reviews for any given product makes it difficult for a consumer to locate the helpful reviews. The purpose of this study was to investigate automatic helpfulness evaluation of online product reviews according to review information types based on the target of information. The underlying assumption was that consumers find reviews containing specific information related to the product itself or the reliability of reviewers more helpful than peripheral information, such as shipping or customer service. Therefore, each sentence was categorized by given information types, which reduced the semantic space of review sentences. Subsequently, we extracted specific information from sentences by using a topic-based representation of the sentences and a clustering algorithm. Review ranking experiments indicated more effective results than other comparable approaches.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr