Search : [ author: 박영택 ] (41)

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values

Hyun-Kyu Park, Wan-Gon Lee, Batselem Jagvaral, Young-Tack Park

http://doi.org/

Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.

An Approach of Scalable SHIF Ontology Reasoning using Spark Framework

Je-Min Kim, Young-Tack Park

http://doi.org/

For the management of a knowledge system, systems that automatically infer and manage scalable knowledge are required. Most of these systems use ontologies in order to exchange knowledge between machines and infer new knowledge. Therefore, approaches are needed that infer new knowledge for scalable ontology. In this paper, we propose an approach to perform rule based reasoning for scalable SHIF ontologies in a spark framework which works similarly to MapReduce in distributed memories on a cluster. For performing efficient reasoning in distributed memories, we focus on three areas. First, we define a data structure for splitting scalable ontology triples into small sets according to each reasoning rule and loading these triple sets in distributed memories. Second, a rule execution order and iteration conditions based on dependencies and correlations among the SHIF rules are defined. Finally, we explain the operations that are adapted to execute the rules, and these operations are based on reasoning algorithms. In order to evaluate the suggested methods in this paper, we perform an experiment with WebPie, which is a representative ontology reasoner based on a cluster using the LUBM set, which is formal data used to evaluate ontology inference and search speed. Consequently, the proposed approach shows that the throughput is improved by 28,400% (157k/sec) from WebPie(553/sec) with LUBM.

SPARQL Query Processing in Distributed In-Memory System

Batselem Jagvaral, Wangon Lee, Kang-Pil Kim, Young-Tack Park

http://doi.org/

In this paper, we propose a query processing approach that uses the Spark functional programming and distributed memory system to solve the computational overhead of SPARQL. In the semantic web, RDF ontology data is produced at large scale, and the main challenge for the semantic web is to query and manipulate such a large ontology with a high throughput. The most existing studies on SPARQL have focused on deploying the Hadoop MapReduce framework, and although approaches based on Hadoop MapReduce have shown promising results, they achieve a low level of throughput due to the underlying distributed file processes. Therefore, in order to speed up the query processes, we suggest query- processing methods that are based on memory caching in distributed memory system. Our approach is also integrated with a clause unification method for propagating between the clauses that exploits Spark join, map and filter methods along with caching. In our experiments, we have achieved a high level of performance relative to other approaches. In particular, our performance was nearly similar to that of Sempala, which has been considered to be the fastest query processing system.

Scalable RDFS Reasoning Using the Graph Structure of In-Memory based Parallel Computing

MyungJoong Jeon, ChiSeoung So, Batselem Jagvaral, KangPil Kim, Jin Kim, JinYoung Hong, YoungTack Park

http://doi.org/

In recent years, there has been a growing interest in RDFS Inference to build a rich knowledge base. However, it is difficult to improve the inference performance with large data by using a single machine. Therefore, researchers are investigating the development of a RDFS inference engine for a distributed computing environment. However, the existing inference engines cannot process data in real-time, are difficult to implement, and are vulnerable to repetitive tasks. In order to overcome these problems, we propose a method to construct an in-memory distributed inference engine that uses a parallel graph structure. In general, the ontology based on a triple structure possesses a graph structure. Thus, it is intuitive to design a graph structure-based inference engine. Moreover, the RDFS inference rule can be implemented by utilizing the operator of the graph structure, and we can thus design the inference engine according to the graph structure, and not the structure of the data table. In this study, we evaluate the proposed inference engine by using the LUBM1000 and LUBM3000 data to test the speed of the inference. The results of our experiment indicate that the proposed in-memory distributed inference engine achieved a performance of about 10 times faster than an in-storage inference engine.

ABox Realization Reasoning in Distributed In-Memory System

Wan-Gon Lee, Young-Tack Park

http://doi.org/

As the amount of knowledge information significantly increases, a lot of progress has been made in the studies focusing on how to reason large scale ontology effectively at the level of RDFS or OWL. These reasoning methods are divided into TBox classifications and ABox realizations. A TBox classification mainly deals with integrity and dependencies in schema, whereas an ABox realization mainly handles a variety of issues in instances. Therefore, the ABox realization is very important in practical applications. In this paper, we propose a realization method for analyzing the constraint of the specified class, so that the reasoning system automatically infers the classes to which instances belong. Unlike conventional methods that take advantage of the object oriented language based distributed file system, we propose a large scale ontology reasoning method using spark, which is a functional programming-based in-memory system. To verify the effectiveness of the proposed method, we used instances created from the Wine ontology by W3C(120 to 600 million triples). The proposed system processed the largest 600 million triples and generated 951 million triples in 51 minutes (696 K triple / sec) in our largest experiment.

MOnCa2: High-Level Context Reasoning Framework based on User Travel Behavior Recognition and Route Prediction for Intelligent Smartphone Applications

Je-Min Kim, Young-Tack Park

http://doi.org/

MOnCa2 is a framework for building intelligent smartphone applications based on smartphone sensors and ontology reasoning. In previous studies, MOnCa determined and inferred user situations based on sensor values represented by ontology instances. When this approach is applied, recognizing user space information or objects in user surroundings is possible, whereas determining the user’s physical context (travel behavior, travel destination) is impossible. In this paper, MOnCa2 is used to build recognition models for travel behavior and routes using smartphone sensors to analyze the user’s physical context, infer basic context regarding the user’s travel behavior and routes by adapting these models, and generate high-level context by applying ontology reasoning to the basic context for creating intelligent applications. This paper is focused on approaches that are able to recognize the user’s travel behavior using smartphone accelerometers, predict personal routes and destinations using GPS signals, and infer high-level context by applying realization.

A Scalable OWL Horst Lite Ontology Reasoning Approach based on Distributed Cluster Memories

Je-Min Kim, Young-Tack Park

http://doi.org/

Current ontology studies use the Hadoop distributed storage framework to perform map-reduce algorithm-based reasoning for scalable ontologies. In this paper, however, we propose a novel approach for scalable Web Ontology Language (OWL) Horst Lite ontology reasoning, based on distributed cluster memories. Rule-based reasoning, which is frequently used for scalable ontologies, iteratively executes triple-format ontology rules, until the inferred data no longer exists. Therefore, when the scalable ontology reasoning is performed on computer hard drives, the ontology reasoner suffers from performance limitations. In order to overcome this drawback, we propose an approach that loads the ontologies into distributed cluster memories, using Spark (a memory-based distributed computing framework), which executes the ontology reasoning. In order to implement an appropriate OWL Horst Lite ontology reasoning system on Spark, our method divides the scalable ontologies into blocks, loads each block into the cluster nodes, and subsequently handles the data in the distributed memories. We used the Lehigh University Benchmark, which is used to evaluate ontology inference and search speed, to experimentally evaluate the methods suggested in this paper, which we applied to LUBM8000 (1.1 billion triples, 155 gigabytes). When compared with WebPIE, a representative mapreduce algorithm-based scalable ontology reasoner, the proposed approach showed a throughput improvement of 320% (62k/s) over WebPIE (19k/s).

Robust Particle Filter Based Route Inference for Intelligent Personal Assistants on Smartphones

Haejung Baek, Young Tack Park

http://doi.org/

Much research has been conducted on location-based intelligent personal assistants that can understand a user"s intention by learning the user"s route model and then inferring the user"s destinations and routes using data of GPS and other sensors in a smartphone. The intelligence of the location-based personal assistant is contingent on the accuracy and efficiency of the real-time predictions of the user"s intended destinations and routes by processing movement information based on uncertain sensor data. We propose a robust particle filter based on Dynamic Bayesian Network model to infer the user"s routes. The proposed robust particle filter includes a particle generator to supplement the incorrect and incomplete sensor information, an efficient switching function and an weight function to reduce the computation complexity as well as a resampler to enhance the accuracy of the particles. The proposed method improves the accuracy and efficiency of determining a user"s routes and destinations.

Real-time and Parallel Semantic Translation Technique for Large-Scale Streaming Sensor Data in an IoT Environment

SoonHyun Kwon, Dongwan Park, Hyochan Bang, Youngtack Park

http://doi.org/

Nowadays, studies on the fusion of Semantic Web technologies are being carried out to promote the interoperability and value of sensor data in an IoT environment. To accomplish this, the semantic translation of sensor data is essential for convergence with service domain knowledge. The existing semantic translation technique, however, involves translating from static metadata into semantic data(RDF), and cannot properly process real-time and large-scale features in an IoT environment. Therefore, in this paper, we propose a technique for translating large-scale streaming sensor data generated in an IoT environment into semantic data, using real-time and parallel processing. In this technique, we define rules for semantic translation and store them in the semantic repository. The sensor data is translated in real-time with parallel processing using these pre-defined rules and an ontology-based semantic model. To improve the performance, we use the Apache Storm, a real-time big data analysis framework for parallel processing. The proposed technique was subjected to performance testing with the AWS observation data of the Meteorological Administration, which are large-scale streaming sensor data for demonstration purposes.

Distributed Table Join for Scalable RDFS Reasoning on Cloud Computing Environment

Wan-Gon Lee, Je-Min Kim, Young-Tack Park

http://doi.org/

The Knowledge service system needs to infer a new knowledge from indicated knowledge to provide its effective service. Most of the Knowledge service system is expressed in terms of ontology. The volume of knowledge information in a real world is getting massive, so effective technique for massive data of ontology is drawing attention. This paper is to provide the method to infer massive data-ontology to the extent of RDFS, based on cloud computing environment, and evaluate its capability. RDFS inference suggested in this paper is focused on both the method applying MapReduce based on RDFS meta table, and the method of single use of cloud computing memory without using MapReduce under distributed file computing environment. Therefore, this paper explains basically the inference system structure of each technique, the meta table set-up according to RDFS inference rule, and the algorithm of inference strategy. In order to evaluate suggested method in this paper, we perform experiment with LUBM set which is formal data to evaluate ontology inference and search speed. In case LUBM6000, the RDFS inference technique based on meta table had required 13.75 minutes(inferring 1,042 triples per second) to conduct total inference, whereas the method applying the cloud computing memory had needed 7.24 minutes(inferring 1,979 triples per second) showing its speed twice faster.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr