Search : [ keyword: Embedding ] (69)

Deep Learning-based Text Classification Model for Poisonous Clauses Classification

Gihyeon Choi, Youngjin Jang, Harksoo Kim, Kwanwoo Kim

http://doi.org/10.5626/JOK.2020.47.11.1054

Most companies sign contracts based on the contract prior to executing the task. However, several problems can occur if the poisonous clauses are not identified before the contract is concluded. To prevent this problem, companies have an expert review the contract, but the service requires much time and money. If there is a system in which the poisonous clauses can be identified through prior review of the contract, the high cost and time incurred in reviewing the contract can be mitigated. Thus, this paper proposes a text classification model that identifies any poisonous clause in the contract by inputing each paragraph in the contract. To improve the classification performance of the proposed model, the importance of each sentence is calculated based on the relationship information between the sentence in the paragraph and the class to be classified, and classification is performed by reflecting it in each sentence. The proposed model showed the performance of the F1 score 84.51%p in experiments using actual contract data and the highest performance with the F1 score 93.64%p in experiments using the WOS-5736 dataset for the performance comparison with the existing text classification models.

Visual Commonsense Reasoning with Vision-Language Co-embedding and Knowledge Graph Embedding

Jaeyun Lee, Incheol Kim

http://doi.org/10.5626/JOK.2020.47.10.985

In this paper, we proposed a novel model for Visual Commonsense Reasoning (VCR). The proposed model co-embeds multi-modal input data together using a pre-trained vision-language model to effectively cope with the problem of visual grounding, which requires mutual alignment between an image, a natural language question, and the corresponding answer list. In addition, the proposed model extracts the common conceptual knowledge necessary for Visual Commonsense Reasoning from ConceptNet, an open knowledge base, and then embeds it using a Graph Convolutional neural Network (GCN). In this paper, we introduced the design details of the proposed model, VLKG_VCR, and verified the performance of the model through various experiments using an enhanced VCR benchmark data set.

Incorrect Triple Detection Using Knowledge Graph Embedding and Adaptive Clustering

Won-Chul Shin, Jea-Seung Roh, Young-Tack Park

http://doi.org/10.5626/JOK.2020.47.10.958

Recently, with the increase in the amount of information from the development of the Internet, research using large-capacity knowledge graphs is being actively conducted. Additionally, as knowledge graphs are used for various research and services, there is a need to secure quality knowledge graphs. However, there is a lack of research to detect errors within the knowledge graphs to obtain quality knowledge graphs. Previous studies using the embedding and clustering for error triple detection showed good performance. However, in the process of the cluster optimization, there was a problem that the characteristics of each cluster could not be factored using the same threshold collectively. In this paper, to resolve these problems, we propose an adaptive clustering model in which clustering is conducted by finding and applying the optimum threshold for each cluster with the embedding for knowledge graph for error triple detection in the knowledge graph. To evaluate the performance of the method proposed in this paper, the existing error triple detection studies and comparative experiments were conducted on three datasets, DBpeida, Frebase and WiseKB, and the high performance was confirmed by an average of 5.3% based on the F1-Score.

Open Domain Question Answering using Knowledge Graph

Giho Lee, Incheol Kim

http://doi.org/10.5626/JOK.2020.47.9.853

In this paper, we propose a novel knowledge graph inference model called KGNet for answering the open domain complex questions. This model addresses the problem of knowledge base incompleteness. In this model, two different types of knowledge resources, knowledge base and corpus, are integrated into a single knowledge graph. Moreover, to derive answers to complex multi-hop questions effectively, this model adopts a new knowledge embedding and reasoning module based on Graph Neural Network (GNN). We demonstrate the effectiveness and performance of the proposed model through various experiments over two large question answering benchmark datasets, WebQuestionsSP and MetaQA.

Path Embedding-Based Knowledge Graph Completion Approach

Batselem Jagvaral, Min-Sung Kim, Young-Tack Park

http://doi.org/10.5626/JOK.2020.47.8.722

Knowledge graphs are widely used in question answering systems. However, in these circumstances most of the relations between the entities in the knowledge graph tend to be missing. To solve this issue, we propose a CNN(Convolutional Neural Network) + BiLSTM(Bidirectional LSTM) based approach to infer missing links in the knowledge graphs. Our method embeds paths connecting two entities into a low-dimensional space via CNN + BiLSTM. Then, an attention operation is used to attentively combine path embeddings to represent two entities. Finally, we measure the similarity between the target relation and representation of the entities to predict whether or not the relation connects those entities. By combining a CNN and BiLSTM, we are able to take advantage of the CNN’s ability to recognize local patterns and the LSTM’s ability to produce entity and relation ordering. In this way, it is possible to effectively identify low-dimensional path features and predict the relationships between entities using the learned features. In our experiments, we performed link prediction tasks on 4 different knowledge graphs and showed that our method achieves comparable results to state-of-the-art methods.

DPESS: Daytime Satellite Imagery-based Prediction of Demographic Attributes Using Embedding Spatial Statistics

Hyunji Cha, Sungwon Han, Donghyun Ahn, Sungwon Park, Meeyoung Cha

http://doi.org/10.5626/JOK.2020.47.8.742

Studies are being actively conducted to predict or analyze demographics used as socioeconomic factors using satellite images. We present a new approach, called DPESS, for estimating demographic attributes from daytime satellite imagery based on a deep neural network model. The four steps of the DPESS summarize any number of input images into a fixed-length embedded vector without a considerable loss of information, which is possible because of its unique structure and technique like transfer learning and embedded spatial statistics. Our extensive validation demonstrates that the DPESS model can predict various advanced demographics such as population density (R² =0.94), population count by age group (0.80), population count by education degree (0.79), and total purchase amount per household (0.80). We discuss future applications of this method in terms of applying our algorithm to other countries.

A Small-Scale Korean-Specific BERT Language Model

Sangah Lee, Hansol Jang, Yunmee Baik, Suzi Park, Hyopil Shin

http://doi.org/10.5626/JOK.2020.47.7.682

Recent models for the sentence embedding use huge corpus and parameters. They have massive data and large hardware and it incurs extensive time to pre-train. This tendency raises the need for a model with comparable performance while economically using training data. In this study, we proposed a Korean-specific model KR-BERT, using sub-character level to character-level Korean dictionaries and BidirectionalWordPiece Tokenizer. As a result, our KR-BERT model performs comparably and even better than other existing pre-trained models using one-tenth the size of training data from the existing models. It demonstrates that in a morphologically complex and resourceless language, using sub-character level and BidirectionalWordPiece Tokenizer captures language-specific linguistic phenomena that the Multilingual BERT model missed.

Approach for Managing Multiple Class Membership in Knowledge Graph Completion Using Bi-LSTM

Jae-Seung Roh, Batselem Jagvaral, Wan-Gon Lee, Young-Tack Park

http://doi.org/10.5626/JOK.2020.47.6.559

Knowledge graphs that represent real world information in a structured way are widely used in areas, such as Web browsing and recommendation systems. But there is a problem of missing links between entities in knowledge graphs. To resolve this issue, various studies using embedding techniques or deep learning have been proposed. Especially, the recent study combining CNN and Bidirectional-LSTM has shown high performance compared to previous studies. However, in the previous study, if multiple class types are defined for single entity, the amount of training data exponentially increases with the training time. Also, if class type information for an entity is not defined, training data for that entity cannot be generated. Thus, to enable the generation of training data for such entities and manage multiple class membership in knowledge graph completion, we propose two approaches using pre-trained embedding vectors of knowledge graph and the concept of vector addition. To evaluate the performance of the methods proposed in this paper, we conducted comparative experiments with the existing knowledge completion studies on NELL-995 and FB15K-237 datasets, and obtained MAP 1.6%p and MRR 1.5%p higher than that of the previous studies.

Defining Chunks and Chunking using Its Corpus and Bi-LSTM/CRFs in Korean

Young Namgoong, Chang-Hyun Kim, Min-ah Cheon, Ho-min Park, Ho Yoon, Min-seok Choi, Jae-kyun Kim, Jae-Hoon Kim

http://doi.org/10.5626/JOK.2020.47.6.587

There are several notorious problems in Korean dependency parsing: the head position problem and the constituent unit problem. Such problems can be somewhat resolved by chunking. Chunking seeks to locate and classify constituents referred to as chunks into predefined categories. So far, several studies in Korean have been conducted without a clear definition of chunks partially. Thus, we define chunks in Korean thoroughly and build a chunk-tagged corpus based on the definition as well as propose a Bi-LSTM/CRF chunking model using the corpus. Through experiments, we have shown that the proposed model achieved a F1-score of 98.54% and can be used for practical applications. We analyzed performance variations according to word embedding and so fastText showed the best performance. Error analysis was performed so that it could be used to improve the proposed model in the near future.

Comparison of Context-Sensitive Spelling Error Correction using Embedding Techniques

Jung-Hun Lee, Minho Kim, Hyuk-Chul Kwon

http://doi.org/10.5626/JOK.2020.47.2.147

This paper focuses on the use of embedding techniques to solve problems in context-sensitive spelling correction and compare the performance of each technique. A vector of words obtained through embedding learning is used to correct the distance between the correction target word and the surrounding context word. In this paper, we tried to improve the correction performance by reflecting the processing of words not included in the learning corpus and surrounding contextual information of the correction words. The embedding techniques used for proofing were divided into word-based embeddings and embeddings that reflected contextual information. This paper performed correction experiments using the embedding techniques, focusing on the above two improvement goals, and obtained reliable correction performance.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr