Search : [ author: Young-Tack Park ] (34)

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values

Hyun-Kyu Park, Wan-Gon Lee, Batselem Jagvaral, Young-Tack Park

http://doi.org/

Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.

An Approach of Scalable SHIF Ontology Reasoning using Spark Framework

Je-Min Kim, Young-Tack Park

http://doi.org/

For the management of a knowledge system, systems that automatically infer and manage scalable knowledge are required. Most of these systems use ontologies in order to exchange knowledge between machines and infer new knowledge. Therefore, approaches are needed that infer new knowledge for scalable ontology. In this paper, we propose an approach to perform rule based reasoning for scalable SHIF ontologies in a spark framework which works similarly to MapReduce in distributed memories on a cluster. For performing efficient reasoning in distributed memories, we focus on three areas. First, we define a data structure for splitting scalable ontology triples into small sets according to each reasoning rule and loading these triple sets in distributed memories. Second, a rule execution order and iteration conditions based on dependencies and correlations among the SHIF rules are defined. Finally, we explain the operations that are adapted to execute the rules, and these operations are based on reasoning algorithms. In order to evaluate the suggested methods in this paper, we perform an experiment with WebPie, which is a representative ontology reasoner based on a cluster using the LUBM set, which is formal data used to evaluate ontology inference and search speed. Consequently, the proposed approach shows that the throughput is improved by 28,400% (157k/sec) from WebPie(553/sec) with LUBM.

SPARQL Query Processing in Distributed In-Memory System

Batselem Jagvaral, Wangon Lee, Kang-Pil Kim, Young-Tack Park

http://doi.org/

In this paper, we propose a query processing approach that uses the Spark functional programming and distributed memory system to solve the computational overhead of SPARQL. In the semantic web, RDF ontology data is produced at large scale, and the main challenge for the semantic web is to query and manipulate such a large ontology with a high throughput. The most existing studies on SPARQL have focused on deploying the Hadoop MapReduce framework, and although approaches based on Hadoop MapReduce have shown promising results, they achieve a low level of throughput due to the underlying distributed file processes. Therefore, in order to speed up the query processes, we suggest query- processing methods that are based on memory caching in distributed memory system. Our approach is also integrated with a clause unification method for propagating between the clauses that exploits Spark join, map and filter methods along with caching. In our experiments, we have achieved a high level of performance relative to other approaches. In particular, our performance was nearly similar to that of Sempala, which has been considered to be the fastest query processing system.

ABox Realization Reasoning in Distributed In-Memory System

Wan-Gon Lee, Young-Tack Park

http://doi.org/

As the amount of knowledge information significantly increases, a lot of progress has been made in the studies focusing on how to reason large scale ontology effectively at the level of RDFS or OWL. These reasoning methods are divided into TBox classifications and ABox realizations. A TBox classification mainly deals with integrity and dependencies in schema, whereas an ABox realization mainly handles a variety of issues in instances. Therefore, the ABox realization is very important in practical applications. In this paper, we propose a realization method for analyzing the constraint of the specified class, so that the reasoning system automatically infers the classes to which instances belong. Unlike conventional methods that take advantage of the object oriented language based distributed file system, we propose a large scale ontology reasoning method using spark, which is a functional programming-based in-memory system. To verify the effectiveness of the proposed method, we used instances created from the Wine ontology by W3C(120 to 600 million triples). The proposed system processed the largest 600 million triples and generated 951 million triples in 51 minutes (696 K triple / sec) in our largest experiment.

A Scalable OWL Horst Lite Ontology Reasoning Approach based on Distributed Cluster Memories

Je-Min Kim, Young-Tack Park

http://doi.org/

Current ontology studies use the Hadoop distributed storage framework to perform map-reduce algorithm-based reasoning for scalable ontologies. In this paper, however, we propose a novel approach for scalable Web Ontology Language (OWL) Horst Lite ontology reasoning, based on distributed cluster memories. Rule-based reasoning, which is frequently used for scalable ontologies, iteratively executes triple-format ontology rules, until the inferred data no longer exists. Therefore, when the scalable ontology reasoning is performed on computer hard drives, the ontology reasoner suffers from performance limitations. In order to overcome this drawback, we propose an approach that loads the ontologies into distributed cluster memories, using Spark (a memory-based distributed computing framework), which executes the ontology reasoning. In order to implement an appropriate OWL Horst Lite ontology reasoning system on Spark, our method divides the scalable ontologies into blocks, loads each block into the cluster nodes, and subsequently handles the data in the distributed memories. We used the Lehigh University Benchmark, which is used to evaluate ontology inference and search speed, to experimentally evaluate the methods suggested in this paper, which we applied to LUBM8000 (1.1 billion triples, 155 gigabytes). When compared with WebPIE, a representative mapreduce algorithm-based scalable ontology reasoner, the proposed approach showed a throughput improvement of 320% (62k/s) over WebPIE (19k/s).

MOnCa2: High-Level Context Reasoning Framework based on User Travel Behavior Recognition and Route Prediction for Intelligent Smartphone Applications

Je-Min Kim, Young-Tack Park

http://doi.org/

MOnCa2 is a framework for building intelligent smartphone applications based on smartphone sensors and ontology reasoning. In previous studies, MOnCa determined and inferred user situations based on sensor values represented by ontology instances. When this approach is applied, recognizing user space information or objects in user surroundings is possible, whereas determining the user’s physical context (travel behavior, travel destination) is impossible. In this paper, MOnCa2 is used to build recognition models for travel behavior and routes using smartphone sensors to analyze the user’s physical context, infer basic context regarding the user’s travel behavior and routes by adapting these models, and generate high-level context by applying ontology reasoning to the basic context for creating intelligent applications. This paper is focused on approaches that are able to recognize the user’s travel behavior using smartphone accelerometers, predict personal routes and destinations using GPS signals, and infer high-level context by applying realization.

Distributed Table Join for Scalable RDFS Reasoning on Cloud Computing Environment

Wan-Gon Lee, Je-Min Kim, Young-Tack Park

http://doi.org/

The Knowledge service system needs to infer a new knowledge from indicated knowledge to provide its effective service. Most of the Knowledge service system is expressed in terms of ontology. The volume of knowledge information in a real world is getting massive, so effective technique for massive data of ontology is drawing attention. This paper is to provide the method to infer massive data-ontology to the extent of RDFS, based on cloud computing environment, and evaluate its capability. RDFS inference suggested in this paper is focused on both the method applying MapReduce based on RDFS meta table, and the method of single use of cloud computing memory without using MapReduce under distributed file computing environment. Therefore, this paper explains basically the inference system structure of each technique, the meta table set-up according to RDFS inference rule, and the algorithm of inference strategy. In order to evaluate suggested method in this paper, we perform experiment with LUBM set which is formal data to evaluate ontology inference and search speed. In case LUBM6000, the RDFS inference technique based on meta table had required 13.75 minutes(inferring 1,042 triples per second) to conduct total inference, whereas the method applying the cloud computing memory had needed 7.24 minutes(inferring 1,979 triples per second) showing its speed twice faster.

Knowledge Completion System using Neuro-Symbolic-based Rule Induction and Inference Engine

Won-Chul Shin, Hyun-Kyu Park, Young-Tack Park

http://doi.org/10.5626/JOK.2021.48.11.1202

Recently, there have been several studies on knowledge completion methods aimed to solve the incomplete knowledge graphs problem. Methods such as Neural Theorem Prover (NTP), which combines the advantages of deep learning methods and logic systems, have performed well over existing methods. However, NTP faces challenges in processing large-scale knowledge graphs because all the triples of the knowledge graph are involved in the computation to obtain prediction results for one input. In this paper, we propose an integrated system of deep learning and logic inference methods that can learn vector representations of symbols from improved models of computational complexity of NTP to rule induction, and perform knowledge inference from induced rules using inference engines. In this paper, for rule-induction performance verification of the rule generation model, we compared test data inference ability with NTP using induced rules on Nations, Kinship, and UMLS data set. Experiments with Kdata and WiseKB knowledge inference through inference engines resulted in a 30% increase in Kdata and a 95% increase in WiseKB compared to the knowledge graphs used in experiments.

An Explainable Knowledge Completion Model Using Explanation Segments

Min-Ho Lee, Wan-Gon Lee, Batselem Jagvaral, Young-Tack Park

http://doi.org/10.5626/JOK.2021.48.6.680

Recently, a large number of studies that used deep learning have been conducted to predict new links in incomplete knowledge graphs. However, link prediction using deep learning has a major limitation as the inferred results cannot be explained. We propose a high-utility knowledge graph prediction model that yields explainable inference paths supporting the inference results. We define paths to the object from the knowledge graph using a path ranking algorithm and define them as the explanation segments. Then, the generated explanation segments are embedded using a Convolutional neural network (CNN) and a Bidirectional Long short-term memory (BiLSTM). The link prediction model is then trained by applying an attention mechanism, based on the calculation of the semantic similarity between the embedded explanation segments and inferred candidate predicates to be inferred. The explanation segment suitable for link prediction explanation is selected based on the measured attention scores. To evaluate the performance of the proposed method, a link prediction comparison experiment and an accuracy verification experiment are performed to measure the proportion of the explanation segments suitable to explain the link prediction results. We used the benchmark datasets NELL-995, FB15K-237, and countries for the experiment, and accuracy verification experiments showed the accuracies of 89%, 44%, and 97%, respectively. Compared with the existing method, the NELL-995, FB15K-237 data exhibited 35%p and 21%p higher performance on average.

Knowledge Completion System through Learning the Relationship between Query and Knowledge Graph

Min-Sung Kim, Min-Ho Lee, Wan-Gon Lee, Young-Tack Park

http://doi.org/10.5626/JOK.2021.48.6.649

The knowledge graph is a network comprising of relationships between the entities. In a knowledge graph, there exists a problem of missing or incorrect relationship connection with the specific entities. Numerous studies have proposed learning methods using artificial neural networks based on natural language embedding to solve the problems of the incomplete knowledge graph. Various knowledge graph completion systems are being studied using these methods. In this paper, a system that infers missing knowledge using specific queries and knowledge graphs is proposed. First, a topic is automatically extracted from a query, and topic embedding is obtained from the knowledge graph embedding module. Next, a new triple is inferred by learning the relationship between the topic from the knowledge graph and the query by using Query embedding and knowledge graph embedding. Through this method, the missing knowledge was inferred and the predicate embedding of the knowledge graph related to a specific query was used for good performance. Also, an experiment was conducted using the MetaQA dataset to prove the better performance of the proposed method compared with the existing methods. For the experiment, we used a knowledge graph having movies as a domain. Based on the assumption of the entire knowledge graph and the missing knowledge graph, we experimented on the knowledge graph in which 50% of the triples were randomly omitted. Apparently, better performance than the existing method was obtained.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr