Search : [ author: 이진우 ] (2)

A Sort and Merge Method for Genome Variant Call Format (GVCF) Files using Parallel and Distributed Computing

JinWoo Lee, Jung-Im Won, JeeHee Yoon

http://doi.org/10.5626/JOK.2021.48.3.358

With the development of next-generation sequencing (NGS) techniques, a large volume of genomic data is being produced and accumulated, and parallel and distributed computing has become an essential tool. Generally, NGS data processing entails two main steps: obtaining read alignment results in BAM format and extracting variant information in genome variant call format (GVCF) or variant call format (VCF). However, each step requires a long execution time due to the size of the data. In this study, we propose a new GVCF file sorting/merging module using distributed parallel clusters to shorten the execution time. In the proposed algorithm, Spark is used as a distributed parallel cluster. The sorting/merge process is performed in two steps according to the structural characteristics of the GVCF file in order to use the resources in the cluster efficiently. The performance was evaluated by comparing our method with the GATK"s CombineGVCFs module based on sorting and merging execution time of multiple GVCF files. The outcomes suggest the effectiveness of the proposed method in reducing execution time. The method can be used as a scalable and powerful distributed computing tool to solve the GVCF file sorting/merge problem.

A Fast and Scalable Image Retrieval Algorithms by Leveraging Distributed Image Feature Extraction on MapReduce

Hwan-Jun Song, Jin-Woo Lee, Jae-Gil Lee

http://doi.org/

With mobile devices showing marked improvement in performance in the age of the Internet of Things (IoT), there is demand for rapid processing of the extensive amount of multimedia big data. However, because research on image searching is focused mainly on increasing accuracy despite environmental changes, the development of fast processing of high-resolution multimedia data queries is slow and inefficient. Hence, we suggest a new distributed image search algorithm that ensures both high accuracy and rapid response by using feature extraction of distributed images based on MapReduce, and solves the problem of memory scalability based on BIRCH indexing. In addition, we conducted an experiment on the accuracy, processing time, and scalability of this algorithm to confirm its excellent performance.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr