Search : [ keyword: 데이터 처리 ] (7)

Predicting the Cache Performance Benefits for In-memory Data Analytics Frameworks

Minseop Jeong, Hwansoo Han

http://doi.org/10.5626/JOK.2021.48.5.479

In-memory data analytics frameworks provide intermediate results in caching facilities for performance. For effective caching, the actual performance benefits from cached data should be taken into consideration. As existing frameworks only measure execution times at the distributed task level, they have limitations in predicting the cache performance benefits accurately. In this paper, we propose an operator-level time measurement method, which incorporates the existing task-level execution time measurement with our cost prediction model according to input data sizes. Based on the proposed model and the execution flow of the application, we propose a prediction method for the performance benefits from data caching. Our proposed model provides opportunities for cache optimization with predicted performance benefits. Our cost model for operators showed prediction error rate of 7.3% on average, when measured with 10x input data. The difference between predicted performance and actual performance wes limited to within 24%.

A Greedy Rule Allocation Algorithm for Efficient Distributed Complex Event Processing

Yooju Shin, Jae-Gil Lee

http://doi.org/10.5626/JOK.2019.46.12.1222

Complex event processing (CEP) is defined as event processing for multiple stream sources to infer events that suggest complicated circumstances. As the size of stream data becomes larger, CEP engines have been parallelized to benefit from distributed computing. However, distributed CEP could duplicate redundant stream data and increase latency without consideration about the computational cost on each engine after the allocation of stream data and CEP rules. In this paper, we suggest an efficient rule allocation algorithm to prevent such situations. This algorithm determines event rules priorities for the allocation, wherein the rule with higher priority is allocated first to the engine that minimizes the increase of the value of the proposed cost function. We prove the superiority of our algorithm in two tests. In the optimization verification test, our algorithm achieves the results closest to the optimal results compared with the other algorithms. In the performance test, our algorithm shows lower latency and data replication ratio in the distributed CEP system using real world dataset and event rules.

An Offloading Scheme for Reliable Data Processing of Swarm-drones

Hong Min, Bongjae Kim, Junyoung Heo, Jinman Jung

http://doi.org/10.5626/JOK.2018.45.10.990

With the developing drone-related technologies, autonomous drones have many applications. The offloading technique is used to execute high computational tasks that are stored in the cloud to preserve the limited resources of a drone. In this paper, we determine the effect of offloading by using cost analysis for swarm-drones considering task completion time and energy consumption. If the drones take more time and spend more energy while offloading their tasks to the cloud, drones divide a large task into small tasks. These tasks are run by using the drone’s own resources to process data reliably and efficiently. Our simulation results also show how the task completion time and the energy consumption infuence the offloading decision.

CEP Rule Distribution Algorithm for In-network Processing in an IoT Network Environment

Sunghoon Park, Sanghwa Chung

http://doi.org/10.5626/JOK.2018.45.7.722

As the number of IoT devices increases, data coming from devices are also increasing exponentially. The data generated from devices are stored and managed through a system structure using the database. However, to manage the surging data, the existing database is limited in terms of maintenance costs and in real time. Too overcome these limitations, Complex Event Processing (CEP), which processes data as much as possible within the network, has emerged, and data processing is being carried out using this strategy. In this paper, we propose a CEP Rule distribution algorithm which can reduce server burden and guarantee network performance through distribution of the CEP Rule in an IoT environment. To prove this, we perform a small experiment using open source, such as the OpenWSN and TelosB node, and verify the mitigation of server load and the performance of data processing according to the algorithm.

Distributed Processing Method of Hotspot Spatial Analysis Based on Hadoop and Spark

Changsoo Kim, Joosub Lee, KyuMoon Hwang, Hyojin Sung

http://doi.org/10.5626/JOK.2018.45.2.99

One of the spatial statistical analysis, hotspot analysis is one of easy method of see spatial patterns. It is based on the concept that "Adjacent ones are more relevant than those that are far away". However, in hotspot analysis is spatial adjacency must be considered, Therefore, distributed processing is not easy. In this paper, we proposed a distributed algorithm design for hotspot spatial analysis. Its performance was compared to standalone system and Hadoop, Spark based processing. As a result, it is compare to standalone system, Performance improvement rate of Hadoop at 625.89% and Spark at 870.14%. Furthermore, performance improvement rate is high at Spark processing than Hadoop at as more large data set.

A Comparative Analysis of Recursive Query Algorithm Implementations based on High Performance Distributed In-Memory Big Data Processing Platforms

Minseo Kang, Jaesung Kim, Jaegil Lee

http://doi.org/

Recursive query algorithm is used in many social network services, e.g., reachability queries in social networks. Recently, the size of social network data has increased as social network services evolve. As a result, it is almost impossible to use the recursive query algorithm on a single machine. In this paper, we implement recursive query on two popular in-memory distributed platforms, Spark and Twister, to solve this problem. We evaluate the performance of two implementations using 50 machines on Amazon EC2, and real-world data sets: LiveJournal and ClueWeb. The result shows that recursive query algorithm shows better performance on Spark for the Livejournal input data set with relatively high average degree, but smaller vertices. However, recursive query on Twister is superior to Spark for the ClueWeb input data set with relatively low average degree, but many vertices.

A Design of Effective Inference Methods and Their Application Guidelines for Supporting Various Medical Analytics Schemes

Moon Kwon Kim, Hyun Jung La, Soo Dong Kim

http://doi.org/

As a variety of personal medical devices appear, it is possible to acquire a large number of diverse medical contexts from the devices. There have been efforts to analyze the medical contexts via software applications. In this paper, we propose a generic model of medical analytics schemes that are used by medical experts, identify inference methods for realizing each medical analytics scheme, and present guidelines for applying the inference methods to the medical analytics schemes. Additionally, we develop a PoC inference system and analyze real medical contexts to diagnose relevant diseases so that we can validate the feasibility and effectiveness of the proposed medical analytics schemes and guidelines of applying inference methods.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr