Search : [ author: Batselem Jagvaral ] (13)

An Explainable Knowledge Completion Model Using Explanation Segments

Min-Ho Lee, Wan-Gon Lee, Batselem Jagvaral, Young-Tack Park

http://doi.org/10.5626/JOK.2021.48.6.680

Recently, a large number of studies that used deep learning have been conducted to predict new links in incomplete knowledge graphs. However, link prediction using deep learning has a major limitation as the inferred results cannot be explained. We propose a high-utility knowledge graph prediction model that yields explainable inference paths supporting the inference results. We define paths to the object from the knowledge graph using a path ranking algorithm and define them as the explanation segments. Then, the generated explanation segments are embedded using a Convolutional neural network (CNN) and a Bidirectional Long short-term memory (BiLSTM). The link prediction model is then trained by applying an attention mechanism, based on the calculation of the semantic similarity between the embedded explanation segments and inferred candidate predicates to be inferred. The explanation segment suitable for link prediction explanation is selected based on the measured attention scores. To evaluate the performance of the proposed method, a link prediction comparison experiment and an accuracy verification experiment are performed to measure the proportion of the explanation segments suitable to explain the link prediction results. We used the benchmark datasets NELL-995, FB15K-237, and countries for the experiment, and accuracy verification experiments showed the accuracies of 89%, 44%, and 97%, respectively. Compared with the existing method, the NELL-995, FB15K-237 data exhibited 35%p and 21%p higher performance on average.

Path Embedding-Based Knowledge Graph Completion Approach

Batselem Jagvaral, Min-Sung Kim, Young-Tack Park

http://doi.org/10.5626/JOK.2020.47.8.722

Knowledge graphs are widely used in question answering systems. However, in these circumstances most of the relations between the entities in the knowledge graph tend to be missing. To solve this issue, we propose a CNN(Convolutional Neural Network) + BiLSTM(Bidirectional LSTM) based approach to infer missing links in the knowledge graphs. Our method embeds paths connecting two entities into a low-dimensional space via CNN + BiLSTM. Then, an attention operation is used to attentively combine path embeddings to represent two entities. Finally, we measure the similarity between the target relation and representation of the entities to predict whether or not the relation connects those entities. By combining a CNN and BiLSTM, we are able to take advantage of the CNN’s ability to recognize local patterns and the LSTM’s ability to produce entity and relation ordering. In this way, it is possible to effectively identify low-dimensional path features and predict the relationships between entities using the learned features. In our experiments, we performed link prediction tasks on 4 different knowledge graphs and showed that our method achieves comparable results to state-of-the-art methods.

Approach for Managing Multiple Class Membership in Knowledge Graph Completion Using Bi-LSTM

Jae-Seung Roh, Batselem Jagvaral, Wan-Gon Lee, Young-Tack Park

http://doi.org/10.5626/JOK.2020.47.6.559

Knowledge graphs that represent real world information in a structured way are widely used in areas, such as Web browsing and recommendation systems. But there is a problem of missing links between entities in knowledge graphs. To resolve this issue, various studies using embedding techniques or deep learning have been proposed. Especially, the recent study combining CNN and Bidirectional-LSTM has shown high performance compared to previous studies. However, in the previous study, if multiple class types are defined for single entity, the amount of training data exponentially increases with the training time. Also, if class type information for an entity is not defined, training data for that entity cannot be generated. Thus, to enable the generation of training data for such entities and manage multiple class membership in knowledge graph completion, we propose two approaches using pre-trained embedding vectors of knowledge graph and the concept of vector addition. To evaluate the performance of the methods proposed in this paper, we conducted comparative experiments with the existing knowledge completion studies on NELL-995 and FB15K-237 datasets, and obtained MAP 1.6%p and MRR 1.5%p higher than that of the previous studies.

Partial Embedding Approach for Knowledge Completion

Wan-Gon Lee, Batselem Jagvaral, Ji-Hun Hong, Hyun-Young Choi, Young-Tack Park

http://doi.org/10.5626/JOK.2018.45.11.1168

Knowledge graphs are large networks that describe real world entities and their relationships with triples. Most of the knowledge graphs are far from being complete, and many previous studies have addressed this problem using low dimensional graph embeddings. Such methods assume that knowledge graphs are fixed and do not change. However, real-world knowledge graphs evolve at a rapid pace with the addition of new triples.Repeated retraining of embedding models for the entire graph is computationally expensive and impractical. In this paper, we propose a partial embedding method for partial completion of evolving knowledge graphs. Our method employs ontological axioms and contextual information to extract relations of interest and builds entity and relation embedding models based on instances of such relations. Our experiments demonstrated that the proposed partial embedding method can produce comparable results on knowledge graph completion with state-of-the-art methods while significantly reducing the computation time of entity and relation embeddings by 49%–90% for the Freebase and WiseKB datasets.

Knowledge Completion Modeling using Knowledge Base Embedding

Hyun-Young Choi, Ji-Hun Hong, Wan-Gon Lee, Batselem Jagvaral, Myung-Joong Jeon, Hyun-Kyu Park, Young-Tack Park

http://doi.org/10.5626/JOK.2018.45.9.895

In recent years, a number of studies have been conducted for the purpose of automatically building a knowledge base that is based on web data. However, due to the incomplete nature of web data, there can be missing data or a lack of connections among the data entities that are present. In order to solve this problem, recent studies have proposed methods that train a model to predict this missing data through an artificial neural network based on natural language embedding, but there is a drawback to embedding entities. In practice, natural language corpus is not present in many knowledge bases. Therefore, in this paper, we propose a knowledge completion method that converts the knowledge base of RDF data into an RDF-sentence and uses embedding to create word vectors. We conducted a triple classification experiment in order to measure the performance of the proposed method. The proposed method was then compared with existing NTN models, and on average, 15% accuracy was obtained. In addition, we obtained 88%accuracy by applying the proposed method to the Korean knowledge base known as WiseKB.

SWAT: A Study on the Efficient Integration of SWRL and ATMS based on a Distributed In-Memory System

Myung-Joong Jeon, Wan-Gon Lee, Batselem Jagvaral, Hyun-Kyu Park, Young-Tack Park

http://doi.org/10.5626/JOK.2018.45.2.113

Recently, with the advent of the Big Data era, we have gained the capability of acquiring vast amounts of knowledge from various fields. The collected knowledge is expressed by well-formed formula and in particular, OWL, a standard language of ontology, is a typical form of well-formed formula. The symbolic reasoning is actively being studied using large amounts of ontology data for extracting intrinsic information. However, most studies of this reasoning support the restricted rule expression based on Description Logic and they have limited applicability to the real world. Moreover, knowledge management for inaccurate information is required, since knowledge inferred from the wrong information will also generate more incorrect information based on the dependencies between the inference rules. Therefore, this paper suggests that the SWAT, knowledge management system should be combined with the SWRL (Semantic Web Rule Language) reasoning based on ATMS (Assumption-based Truth Maintenance System). Moreover, this system was constructed by combining with SWRL reasoning and ATMS for managing large ontology data based on the distributed In-memory framework. Based on this, the ATMS monitoring system allows users to easily detect and correct wrong knowledge. We used the LUBM (Lehigh University Benchmark) dataset for evaluating the suggested method which is managing the knowledge through the retraction of the wrong SWRL inference data on large data.

Extracting Rules from Neural Networks with Continuous Attributes

Batselem Jagvaral, Wan-Gon Lee, Myung-joong Jeon, Hyun-Kyu Park, Young-Tack Park

http://doi.org/10.5626/JOK.2018.45.1.22

Over the decades, neural networks have been successfully used in numerous applications from speech recognition to image classification. However, these neural networks cannot explain their results and one needs to know how and why a specific conclusion was drawn. Most studies focus on extracting binary rules from neural networks, which is often impractical to do, since data sets used for machine learning applications contain continuous values. To fill the gap, this paper presents an algorithm to extract logic rules from a trained neural network for data with continuous attributes. It uses hyperplane-based linear classifiers to extract rules with numeric values from trained weights between input and hidden layers and then combines these classifiers with binary rules learned from hidden and output layers to form non-linear classification rules. Experiments with different datasets show that the proposed approach can accurately extract logical rules for data with nonlinear continuous attributes.

Distributed Assumption-Based Truth Maintenance System for Scalable Reasoning

Batselem Jagvaral, Young-Tack Park

http://doi.org/

Assumption-based truth maintenance system (ATMS) is a tool that maintains the reasoning process of inference engine. It also supports non-monotonic reasoning based on dependency-directed backtracking. Bookkeeping all the reasoning processes allows it to quickly check and retract beliefs and efficiently provide solutions for problems with large search space. However, the amount of data has been exponentially grown recently, making it impossible to use a single machine for solving large-scale problems. The maintaining process for solving such problems can lead to high computation cost due to large memory overhead. To overcome this drawback, this paper presents an approach towards incrementally maintaining the reasoning process of inference engine on cluster using Spark. It maintains data dependencies such as assumption, label, environment and justification on a cluster of machines in parallel and efficiently updates changes in a large amount of inferred datasets. We deployed the proposed ATMS on a cluster with 5 machines, conducted OWL/RDFS reasoning over University benchmark data (LUBM) and evaluated our system in terms of its performance and functionalities such as assertion, explanation and retraction. In our experiments, the proposed system performed the operations in a reasonably short period of time for over 80GB inferred LUBM2000 dataset.

Confidence Value based Large Scale OWL Horst Ontology Reasoning

Wan-Gon Lee, Hyun-Kyu Park, Batselem Jagvaral, Young-Tack Park

http://doi.org/

Several machine learning techniques are able to automatically populate ontology data from web sources. Also the interest for large scale ontology reasoning is increasing. However, there is a problem leading to the speculative result to imply uncertainties. Hence, there is a need to consider the reliability problems of various data obtained from the web. Currently, large scale ontology reasoning methods based on the trust value is required because the inference-based reliability of quantitative ontology is insufficient. In this study, we proposed a large scale OWL Horst reasoning method based on a confidence value using spark, a distributed in-memory framework. It describes a method for integrating the confidence value of duplicated data. In addition, it explains a distributed parallel heuristic algorithm to solve the problem of degrading the performance of the inference. In order to evaluate the performance of reasoning methods based on the confidence value, the experiment was conducted using LUBM3000. The experiment results showed that our approach could perform reasoning twice faster than existing reasoning systems like WebPIE.

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values

Hyun-Kyu Park, Wan-Gon Lee, Batselem Jagvaral, Young-Tack Park

http://doi.org/

Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr