Journal of KIISE

Search : [ keyword: Distributed Processing ] (4)

The increased generation of data streams has subsequently led to increased utilization of deep learning. In order to classify data streams using deep learning, we need to execute the model in real-time through serving. Unfortunately, the serving model incurs long latency due to gRPC or HTTP communication. In addition, if the serving model uses a stacking ensemble method with high complexity, a longer latency occurs. To solve the long latency challenge, we proposed distributed processing solutions for data stream classification using Apache Storm. First, we proposed a real-time distributed inference method based on Apache Storm to reduce the long latency of the existing serving method. The present study"s experimental results showed that the proposed distributed inference method reduces the latency by up to 11 times compared to the existing serving method. Second, to reduce the long latency of the stacking-based inference model for detecting malicious URLs, we proposed four distributed processing techniques for classifying URL streams in real-time. The proposed techniques are Independent Stacking, Sequential Stacking, Semi-Sequential Stacking, and Stepwise-Independent Stacking. Our study experimental results showed that Stepwise-Independent Stacking, whose characteristics are similar to those of independent execution and sequential processing, is the best technique for classifying URL streams with the shortest latency.

An Efficient Distributed In-memory High-dimensional Indexing Scheme for Content-based Image Retrieval in Spark Environments

Dojin Choi, Songhee Park, Yeondong Kim, Jiwon Wee, Hyeonbyeong Lee, Jongtae Lim, Kyoungsoo Bok, Jaesoo Yoo

http://doi.org/10.5626/JOK.2020.47.1.95

Content-based image retrieval that searches an object in images has been utilizing for criminal activity monitoring and object tracking in video. In this paper, we propose a high-dimensional indexing scheme based on distributed in-memory for the content-based image retrieval. It provides similarity search by using massive feature vectors extracted from images or objects. In order to process a large amount of data, we utilized a big data platform called Spark. Moreover, we employed a master/slave model for efficient distributed query processing allocation. The master distributes data and queries. and the slaves index and process them. To solve k-NN query processing performance problems in the existing distributed high-dimension indexing schemes, we propose optimization methods for the k-NN query processing considering density and search costs. We conduct various performance evaluations to demonstrate the superiority of the proposed scheme.

Optimization of Distributed Binary Bernoulli Sampling

Wonhyeong Cho, Myeong-Seon Gil, Namsu Ju, Yang-Sae Moon

http://doi.org/10.5626/JOK.2019.46.12.1322

This paper proposes a method to improve the performance of Binary Bernoulli Sampling(BBS). BBS is a sampling technique suitable for a multi-source stream environment. Accordingly, a recent approach has been proposed for distributed processing of BBS based on Apache Storm, with a multi-coordinator structure. However, this approach causes an additional coordinator waiting problem, which limits the performance improvement. In this paper, we solve the coordinator waiting problem by introducing a multi-distribution structure and a distributor separation structure. The multi-distribution structure enables multiple coordinators, rather than one, to participate in the distribution, minimizing the coordinator waiting time. The distributor separation structure moves the distributing function from the coordinators to the distributors, maximizing the processing performance. We perform various experiments by implementing our proposed structure on the Storm-based distributed BBS. The experimental results show that our structure improves the performance by up to 90 times compared to the previous distributed BBS.

An Efficient Continuous Subgraph Matching Scheme Considering Data Reuse

Dojin Choi, Kyoungsoo Bok, Jaesoo Yoo

http://doi.org/10.5626/JOK.2019.46.8.842

With an increase in the utilization of graph streams in various applications, a continuous subgraph matching scheme is required to search the subgraphs that undergo changes in real time. In this paper, we propose an efficient continuous subgraph matching scheme that reuses indexing and performs distributed processing in graph stream environments. In order to perform distributed processing, we propose a query decomposition method based on the degree and subsequently manage the decomposed subqueries as an index. The proposed scheme reuses indexing information to reduce the load on the index caused by the environment in which multiple queries are entered. We also conduct query allocation through a cost model that calculates the indexing load of each server. For efficient performance of distributed processing in stream environments, the proposed scheme was implemented in Storm. Various performance evaluations were conducted to demonstrate the superiority of the proposed scheme.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Distributed Processing of Deep Learning Inference Models for Data Stream Classification

An Efficient Distributed In-memory High-dimensional Indexing Scheme for Content-based Image Retrieval in Spark Environments

Optimization of Distributed Binary Bernoulli Sampling

An Efficient Continuous Subgraph Matching Scheme Considering Data Reuse

Search

Editorial Office