Search : [ author: 염헌영 ] (4)

Performance Analysis of Concurrent Multitasking for Efficient Resource Utilization of GPUs

Sejin Kim, Qichen Chen, HeonYoung Yeom, Yoonhee Kim

http://doi.org/10.5626/JOK.2021.48.6.604

As Graphics Processing Units (GPUs) are widely utilized to accelerate compute-intensive applications, their application has expanded especially in data centers and clouds. However, the existing resource sharing methods within GPU are limited and cannot efficiently handle several requests of concurrent cloud users’ executions on GPU while effectively utilizing the available system resources. In addition, it is challenging to effectively partition resources within GPU without understanding and assimilating application execution patterns. This paper proposes an execution pattern-based application classification method and analyzes run-time characteristics: why the performance of an application is saturated at a point regardless of the allocated resources. In addition, we analyze the multitasking performance of the co-allocated applications using smCompactor, a thread block-based scheduling framework. We identify near-best co-allocated application sets, which effectively utilize the available system resources. Based on our results, there was a performance improvement of approximately 28% compared to NVIDIA MPS.

Optimizing Swap Use of Programs Using Memory Access Profiling

Yunjae Lee, Heon Y. Yeom, Hyuck Han

http://doi.org/10.5626/JOK.2020.47.5.466

The slow growth rate of main memory and modern computing workloads requires lots of memory, making main memory the bottleneck of system performance. Swapping provides a large virtual memory to programs by utilizing fast but small main memory and large secondary storage. However, programs cannot accomplish optimal performance due to conservative swapping policy which targets general workloads. The objective of this study was to analyze memory access pattern of programs and optimize programs to utilize swapping considering memory access pattern. A low-overhead memory profiling technique and a simple optimization technique can help programmers optimize their programs with ease. We optimized six workloads using these techniques and improved the performance of the workloads by 43%.

Knowing the Cost of Synchronization Primitives on Modern Hardware

SeongJae Park, Hyuck Han, Heon Y. Yeom

http://doi.org/10.5626/JOK.2018.45.11.1210

In multi-core systems, which are widely prevalent, it is important to use an efficient concurrency control algorithm that utilizes every core. However, Amdahl’s Law states that a program cannot scale infinitely if it contains any unscalable sub-section. Furthermore, the Laws of Orders state that the expensive cost of synchronization for ordering in a concurrent algorithm cannot be eliminated. As a consequence, knowing the cost of each synchronization primitive is important for making tradeoff decisions regarding an algorithm. Although the rough costs of common synchronization primitives are already known, the result may be not applicable or inaccurate for a specific system, because the cost is hardware dependent. In this paper, we evaluate the cost of famous synchronization primitives on a modern system and discuss the results.

Performance Comparison between Hardware & Software Cache Partitioning Techniques

JiWoong Park, HeonYoung Yeom, Hyeonsang Eom

http://doi.org/

The era of multi-core processors has begun since the limit of the clock speed has been reached. These days, multi-core technology is used not only in desktops, servers, and table PCs, but also in smartphones. In this architecture, there is always interference between processes, because of the sharing of system resources. To address this problem, cache partitioning is used, which can be roughly divided into two types: software and hardware cache partitioning. When it comes to dynamic cache partitioning, hardware cache partitioning is superior to software cache partitioning, because it needs no page copy. In this paper, we compare the effectiveness of hardware and software cache partitioning on the AMD Opteron 6282 SE, which is the only commodity processor providing hardware cache partitioning, to see whether this technique can be effectively deployed in dynamic environments.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr