Search : [ author: 바트셀렘 ] (10)

An Explainable Knowledge Completion Model Using Explanation Segments

Min-Ho Lee, Wan-Gon Lee, Batselem Jagvaral, Young-Tack Park

http://doi.org/10.5626/JOK.2021.48.6.680

Recently, a large number of studies that used deep learning have been conducted to predict new links in incomplete knowledge graphs. However, link prediction using deep learning has a major limitation as the inferred results cannot be explained. We propose a high-utility knowledge graph prediction model that yields explainable inference paths supporting the inference results. We define paths to the object from the knowledge graph using a path ranking algorithm and define them as the explanation segments. Then, the generated explanation segments are embedded using a Convolutional neural network (CNN) and a Bidirectional Long short-term memory (BiLSTM). The link prediction model is then trained by applying an attention mechanism, based on the calculation of the semantic similarity between the embedded explanation segments and inferred candidate predicates to be inferred. The explanation segment suitable for link prediction explanation is selected based on the measured attention scores. To evaluate the performance of the proposed method, a link prediction comparison experiment and an accuracy verification experiment are performed to measure the proportion of the explanation segments suitable to explain the link prediction results. We used the benchmark datasets NELL-995, FB15K-237, and countries for the experiment, and accuracy verification experiments showed the accuracies of 89%, 44%, and 97%, respectively. Compared with the existing method, the NELL-995, FB15K-237 data exhibited 35%p and 21%p higher performance on average.

Path Embedding-Based Knowledge Graph Completion Approach

Batselem Jagvaral, Min-Sung Kim, Young-Tack Park

http://doi.org/10.5626/JOK.2020.47.8.722

Knowledge graphs are widely used in question answering systems. However, in these circumstances most of the relations between the entities in the knowledge graph tend to be missing. To solve this issue, we propose a CNN(Convolutional Neural Network) + BiLSTM(Bidirectional LSTM) based approach to infer missing links in the knowledge graphs. Our method embeds paths connecting two entities into a low-dimensional space via CNN + BiLSTM. Then, an attention operation is used to attentively combine path embeddings to represent two entities. Finally, we measure the similarity between the target relation and representation of the entities to predict whether or not the relation connects those entities. By combining a CNN and BiLSTM, we are able to take advantage of the CNN’s ability to recognize local patterns and the LSTM’s ability to produce entity and relation ordering. In this way, it is possible to effectively identify low-dimensional path features and predict the relationships between entities using the learned features. In our experiments, we performed link prediction tasks on 4 different knowledge graphs and showed that our method achieves comparable results to state-of-the-art methods.

Approach for Managing Multiple Class Membership in Knowledge Graph Completion Using Bi-LSTM

Jae-Seung Roh, Batselem Jagvaral, Wan-Gon Lee, Young-Tack Park

http://doi.org/10.5626/JOK.2020.47.6.559

Knowledge graphs that represent real world information in a structured way are widely used in areas, such as Web browsing and recommendation systems. But there is a problem of missing links between entities in knowledge graphs. To resolve this issue, various studies using embedding techniques or deep learning have been proposed. Especially, the recent study combining CNN and Bidirectional-LSTM has shown high performance compared to previous studies. However, in the previous study, if multiple class types are defined for single entity, the amount of training data exponentially increases with the training time. Also, if class type information for an entity is not defined, training data for that entity cannot be generated. Thus, to enable the generation of training data for such entities and manage multiple class membership in knowledge graph completion, we propose two approaches using pre-trained embedding vectors of knowledge graph and the concept of vector addition. To evaluate the performance of the methods proposed in this paper, we conducted comparative experiments with the existing knowledge completion studies on NELL-995 and FB15K-237 datasets, and obtained MAP 1.6%p and MRR 1.5%p higher than that of the previous studies.

Partial Embedding Approach for Knowledge Completion

Wan-Gon Lee, Batselem Jagvaral, Ji-Hun Hong, Hyun-Young Choi, Young-Tack Park

http://doi.org/10.5626/JOK.2018.45.11.1168

Knowledge graphs are large networks that describe real world entities and their relationships with triples. Most of the knowledge graphs are far from being complete, and many previous studies have addressed this problem using low dimensional graph embeddings. Such methods assume that knowledge graphs are fixed and do not change. However, real-world knowledge graphs evolve at a rapid pace with the addition of new triples.Repeated retraining of embedding models for the entire graph is computationally expensive and impractical. In this paper, we propose a partial embedding method for partial completion of evolving knowledge graphs. Our method employs ontological axioms and contextual information to extract relations of interest and builds entity and relation embedding models based on instances of such relations. Our experiments demonstrated that the proposed partial embedding method can produce comparable results on knowledge graph completion with state-of-the-art methods while significantly reducing the computation time of entity and relation embeddings by 49%–90% for the Freebase and WiseKB datasets.

Knowledge Completion Modeling using Knowledge Base Embedding

Hyun-Young Choi, Ji-Hun Hong, Wan-Gon Lee, Batselem Jagvaral, Myung-Joong Jeon, Hyun-Kyu Park, Young-Tack Park

http://doi.org/10.5626/JOK.2018.45.9.895

In recent years, a number of studies have been conducted for the purpose of automatically building a knowledge base that is based on web data. However, due to the incomplete nature of web data, there can be missing data or a lack of connections among the data entities that are present. In order to solve this problem, recent studies have proposed methods that train a model to predict this missing data through an artificial neural network based on natural language embedding, but there is a drawback to embedding entities. In practice, natural language corpus is not present in many knowledge bases. Therefore, in this paper, we propose a knowledge completion method that converts the knowledge base of RDF data into an RDF-sentence and uses embedding to create word vectors. We conducted a triple classification experiment in order to measure the performance of the proposed method. The proposed method was then compared with existing NTN models, and on average, 15% accuracy was obtained. In addition, we obtained 88%accuracy by applying the proposed method to the Korean knowledge base known as WiseKB.

SWAT: A Study on the Efficient Integration of SWRL and ATMS based on a Distributed In-Memory System

Myung-Joong Jeon, Wan-Gon Lee, Batselem Jagvaral, Hyun-Kyu Park, Young-Tack Park

http://doi.org/10.5626/JOK.2018.45.2.113

Recently, with the advent of the Big Data era, we have gained the capability of acquiring vast amounts of knowledge from various fields. The collected knowledge is expressed by well-formed formula and in particular, OWL, a standard language of ontology, is a typical form of well-formed formula. The symbolic reasoning is actively being studied using large amounts of ontology data for extracting intrinsic information. However, most studies of this reasoning support the restricted rule expression based on Description Logic and they have limited applicability to the real world. Moreover, knowledge management for inaccurate information is required, since knowledge inferred from the wrong information will also generate more incorrect information based on the dependencies between the inference rules. Therefore, this paper suggests that the SWAT, knowledge management system should be combined with the SWRL (Semantic Web Rule Language) reasoning based on ATMS (Assumption-based Truth Maintenance System). Moreover, this system was constructed by combining with SWRL reasoning and ATMS for managing large ontology data based on the distributed In-memory framework. Based on this, the ATMS monitoring system allows users to easily detect and correct wrong knowledge. We used the LUBM (Lehigh University Benchmark) dataset for evaluating the suggested method which is managing the knowledge through the retraction of the wrong SWRL inference data on large data.

Confidence Value based Large Scale OWL Horst Ontology Reasoning

Wan-Gon Lee, Hyun-Kyu Park, Batselem Jagvaral, Young-Tack Park

http://doi.org/

Several machine learning techniques are able to automatically populate ontology data from web sources. Also the interest for large scale ontology reasoning is increasing. However, there is a problem leading to the speculative result to imply uncertainties. Hence, there is a need to consider the reliability problems of various data obtained from the web. Currently, large scale ontology reasoning methods based on the trust value is required because the inference-based reliability of quantitative ontology is insufficient. In this study, we proposed a large scale OWL Horst reasoning method based on a confidence value using spark, a distributed in-memory framework. It describes a method for integrating the confidence value of duplicated data. In addition, it explains a distributed parallel heuristic algorithm to solve the problem of degrading the performance of the inference. In order to evaluate the performance of reasoning methods based on the confidence value, the experiment was conducted using LUBM3000. The experiment results showed that our approach could perform reasoning twice faster than existing reasoning systems like WebPIE.

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values

Hyun-Kyu Park, Wan-Gon Lee, Batselem Jagvaral, Young-Tack Park

http://doi.org/

Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.

SPARQL Query Processing in Distributed In-Memory System

Batselem Jagvaral, Wangon Lee, Kang-Pil Kim, Young-Tack Park

http://doi.org/

In this paper, we propose a query processing approach that uses the Spark functional programming and distributed memory system to solve the computational overhead of SPARQL. In the semantic web, RDF ontology data is produced at large scale, and the main challenge for the semantic web is to query and manipulate such a large ontology with a high throughput. The most existing studies on SPARQL have focused on deploying the Hadoop MapReduce framework, and although approaches based on Hadoop MapReduce have shown promising results, they achieve a low level of throughput due to the underlying distributed file processes. Therefore, in order to speed up the query processes, we suggest query- processing methods that are based on memory caching in distributed memory system. Our approach is also integrated with a clause unification method for propagating between the clauses that exploits Spark join, map and filter methods along with caching. In our experiments, we have achieved a high level of performance relative to other approaches. In particular, our performance was nearly similar to that of Sempala, which has been considered to be the fastest query processing system.

Scalable RDFS Reasoning Using the Graph Structure of In-Memory based Parallel Computing

MyungJoong Jeon, ChiSeoung So, Batselem Jagvaral, KangPil Kim, Jin Kim, JinYoung Hong, YoungTack Park

http://doi.org/

In recent years, there has been a growing interest in RDFS Inference to build a rich knowledge base. However, it is difficult to improve the inference performance with large data by using a single machine. Therefore, researchers are investigating the development of a RDFS inference engine for a distributed computing environment. However, the existing inference engines cannot process data in real-time, are difficult to implement, and are vulnerable to repetitive tasks. In order to overcome these problems, we propose a method to construct an in-memory distributed inference engine that uses a parallel graph structure. In general, the ontology based on a triple structure possesses a graph structure. Thus, it is intuitive to design a graph structure-based inference engine. Moreover, the RDFS inference rule can be implemented by utilizing the operator of the graph structure, and we can thus design the inference engine according to the graph structure, and not the structure of the data table. In this study, we evaluate the proposed inference engine by using the LUBM1000 and LUBM3000 data to test the speed of the inference. The results of our experiment indicate that the proposed in-memory distributed inference engine achieved a performance of about 10 times faster than an in-storage inference engine.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr