Digital Library[ Search Result ]
Design of a Large-scale Task Dispatching & Processing System based on Hadoop
Jik-Soo Kim, Nguyen Cao, Seoyoung Kim, Soonwook Hwang
This paper presents a MOHA(Many-Task Computing on Hadoop) framework which aims to effectively apply the Many-Task Computing(MTC) technologies originally developed for high-performance processing of many tasks, to the existing Big Data processing platform Hadoop. We present basic concepts, motivation, preliminary results of PoC based on distributed message queue, and future research directions of MOHA. MTC applications may have relatively low I/O requirements per task. However, a very large number of tasks should be efficiently processed with potentially heavy inter-communications based on files. Therefore, MTC applications can show another pattern of dataintensive workloads compared to existing Hadoop applications, typically based on relatively large data block sizes. Through an effective convergence of MTC and Big Data technologies, we can introduce a new MOHA framework which can support the large-scale scientific applications along with the Hadoop ecosystem, which is evolving into a multi-application platform.
Effective Distributed Supercomputing Resource Management for Large Scale Scientific Applications
Seungwoo Rho, Jik-Soo Kim, Sangwan Kim, Seoyoung Kim, Soonwook Hwang
Nationwide supercomputing infrastructures in Korea consist of geographically distributed supercomputing clusters. We developed High-Throughput Computing as a Service(HTCaaS) based on these distributed national supecomputing clusters to facilitate the ease at which scientists can explore large-scale and complex scientific problems. In this paper, we present our mechanism for dynamically managing computing resources and show its effectiveness through a case study of a real scientific application called drug repositioning. Specifically, we show that the resource utilization, accuracy, reliability, and usability can be improved by applying our resource management mechanism. The mechanism is based on the concepts of waiting time and success rate in order to identify valid computing resources. The results show a reduction in the total job completion time and improvement of the overall system throughput.
A Case Study of Drug Repositioning Simulation based on Distributed Supercomputing Technology
Jik-Soo Kim, Seungwoo Rho, Minho Lee, Seoyoung Kim, Sangwan Kim, Soonwook Hwang
In this paper, we present a case study for a drug repositioning simulation based on distributed supercomputing technology that requires highly efficient processing of large-scale computations. Drug repositioning is the application of known drugs and compounds to new indications (i.e., new diseases), and this process requires efficient processing of a large number of docking tasks with relatively short per-task execution times. This mechanism shows the main characteristics of a Many-Task Computing (MTC) application, and as a representative case of MTC applications, we have applied a drug repositioning simulation in our HTCaaS system which can leverage distributed supercomputing infrastructure, and show that efficient task dispatching, dynamic resource allocation and load balancing, reliability, and seamless integration of multiple computing resources are crucial to support these challenging scientific applications.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr