Search : [ keyword: data processing ] (8)

A Greedy Rule Allocation Algorithm for Efficient Distributed Complex Event Processing

Yooju Shin, Jae-Gil Lee

http://doi.org/10.5626/JOK.2019.46.12.1222

Complex event processing (CEP) is defined as event processing for multiple stream sources to infer events that suggest complicated circumstances. As the size of stream data becomes larger, CEP engines have been parallelized to benefit from distributed computing. However, distributed CEP could duplicate redundant stream data and increase latency without consideration about the computational cost on each engine after the allocation of stream data and CEP rules. In this paper, we suggest an efficient rule allocation algorithm to prevent such situations. This algorithm determines event rules priorities for the allocation, wherein the rule with higher priority is allocated first to the engine that minimizes the increase of the value of the proposed cost function. We prove the superiority of our algorithm in two tests. In the optimization verification test, our algorithm achieves the results closest to the optimal results compared with the other algorithms. In the performance test, our algorithm shows lower latency and data replication ratio in the distributed CEP system using real world dataset and event rules.

An Offloading Scheme for Reliable Data Processing of Swarm-drones

Hong Min, Bongjae Kim, Junyoung Heo, Jinman Jung

http://doi.org/10.5626/JOK.2018.45.10.990

With the developing drone-related technologies, autonomous drones have many applications. The offloading technique is used to execute high computational tasks that are stored in the cloud to preserve the limited resources of a drone. In this paper, we determine the effect of offloading by using cost analysis for swarm-drones considering task completion time and energy consumption. If the drones take more time and spend more energy while offloading their tasks to the cloud, drones divide a large task into small tasks. These tasks are run by using the drone’s own resources to process data reliably and efficiently. Our simulation results also show how the task completion time and the energy consumption infuence the offloading decision.

CEP Rule Distribution Algorithm for In-network Processing in an IoT Network Environment

Sunghoon Park, Sanghwa Chung

http://doi.org/10.5626/JOK.2018.45.7.722

As the number of IoT devices increases, data coming from devices are also increasing exponentially. The data generated from devices are stored and managed through a system structure using the database. However, to manage the surging data, the existing database is limited in terms of maintenance costs and in real time. Too overcome these limitations, Complex Event Processing (CEP), which processes data as much as possible within the network, has emerged, and data processing is being carried out using this strategy. In this paper, we propose a CEP Rule distribution algorithm which can reduce server burden and guarantee network performance through distribution of the CEP Rule in an IoT environment. To prove this, we perform a small experiment using open source, such as the OpenWSN and TelosB node, and verify the mitigation of server load and the performance of data processing according to the algorithm.

Distributed Processing Method of Hotspot Spatial Analysis Based on Hadoop and Spark

Changsoo Kim, Joosub Lee, KyuMoon Hwang, Hyojin Sung

http://doi.org/10.5626/JOK.2018.45.2.99

One of the spatial statistical analysis, hotspot analysis is one of easy method of see spatial patterns. It is based on the concept that "Adjacent ones are more relevant than those that are far away". However, in hotspot analysis is spatial adjacency must be considered, Therefore, distributed processing is not easy. In this paper, we proposed a distributed algorithm design for hotspot spatial analysis. Its performance was compared to standalone system and Hadoop, Spark based processing. As a result, it is compare to standalone system, Performance improvement rate of Hadoop at 625.89% and Spark at 870.14%. Furthermore, performance improvement rate is high at Spark processing than Hadoop at as more large data set.

Squall: A Real-time Big Data Processing Framework based on TMO Model for Real-time Events and Micro-batch Processing

http://doi.org/

Recently, the importance of velocity, one of the characteristics of big data (5V: Volume, Variety, Velocity, Veracity, and Value), has been emphasized in the data processing, which has led to several studies on the real-time stream processing, a technology for quick and accurate processing and analyses of big data. In this paper, we propose a Squall framework using Time-triggered Message-triggered Object (TMO) technology, a model that is widely used for processing real-time big data. Moreover, we provide a description of Squall framework and its operations under a single node. TMO is an object model that supports the non-regular real-time processing method for certain conditions as well as regular periodic processing for certain amount of time. A Squall framework can support the real-time event stream of big data and micro-batch processing with outstanding performances, as compared to Apache storm and Spark Streaming. However, additional development for processing real-time stream under multiple nodes that is common under most frameworks is needed. In conclusion, the advantages of a TMO model can overcome the drawbacks of Apache storm or Spark Streaming in the processing of real-time big data. The TMO model has potential as a useful model in real-time big data processing.

Finding the Minimum MBRs Embedding K Points

Keonwoo Kim, Younghoon Kim

http://doi.org/

There has been a recent spate in the usage of mobile device equipped GPS sensors, such as smart phones. This trend enables the posting of geo-tagged messages (i.e., multimedia messages with GPS locations) on social media such as Twitter and Facebook, and the volume of such spatial data is rapidly growing. However, the relationships between the location and content of messages are not always explicitly shown in such geo-tagged messages. Thus, the need arises to reorganize search results to find the relationship between keywords and the spatial distribution of messages. We find the smallest minimum bounding rectangle (MBR) that embedding k or more points in order to find the most dense rectangle of data, and it can be usefully used in the location search system. In this paper, we suggest efficient algorithms to discover a group of 2-Dimensional spatial data with a close distance, such as MBR. The efficiency of our proposed algorithms with synthetic and real data sets is confirmed experimentally.

A Design of Effective Inference Methods and Their Application Guidelines for Supporting Various Medical Analytics Schemes

Moon Kwon Kim, Hyun Jung La, Soo Dong Kim

http://doi.org/

As a variety of personal medical devices appear, it is possible to acquire a large number of diverse medical contexts from the devices. There have been efforts to analyze the medical contexts via software applications. In this paper, we propose a generic model of medical analytics schemes that are used by medical experts, identify inference methods for realizing each medical analytics scheme, and present guidelines for applying the inference methods to the medical analytics schemes. Additionally, we develop a PoC inference system and analyze real medical contexts to diagnose relevant diseases so that we can validate the feasibility and effectiveness of the proposed medical analytics schemes and guidelines of applying inference methods.

Grid-based Index Generation and k-nearest-neighbor Join Query-processing Algorithm using MapReduce

Miyoung Jang, Jae Woo Chang

http://doi.org/

MapReduce provides high levels of system scalability and fault tolerance for large-size data processing. A MapReduce-based k-nearest-neighbor(k-NN) join algorithm seeks to produce the k nearest-neighbors of each point of a dataset from another dataset. The algorithm has been considered important in bigdata analysis. However, the existing k-NN join query-processing algorithm suffers from a high index-construction cost that makes it unsuitable for the processing of bigdata. To solve the corresponding problems, we propose a new grid-based, k-NN join query-processing algorithm. Our algorithm retrieves only the neighboring data from a query cell and sends them to each MapReduce task, making it possible to improve the overhead data transmission and computation. Our performance analysis shows that our algorithm outperforms the existing scheme by up to seven-fold in terms of the query-processing time, while also achieving high extent of query-result accuracy.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr