Search : [ keyword: CUDA ] (4)

Memory-Aware Eager Co-Scheduling for Multi-Tenant GPU Environments

Jeongjae Kim, Yunchae Choi, Hwansoo Han

http://doi.org/10.5626/JOK.2024.51.3.210

In a multi-tenant GPU environment, multiple applications are co-located on a single GPU to maximize utilization and throughput. However, co-location can lead to out-of-memory errors. Previous research addressed this problem by scheduling tasks that do not exceed the total GPU memory capacity. Our research introduces two novel methods that allow the co-location of additional tasks on a GPU while effectively preventing out-of-memory errors. Our approach involves immediate deallocation of unused memory within tasks, freeing up memory early on the GPU. This enables additional concurrent execution of multiple tasks on the GPU. Furthermore, by over-subscribing Unified Memory, tasks are scheduled to tolerate memory usage that exceeds the total GPU memory capacity. With our proposed schemes, it is feasible to reduce the execution time of multiple tasks compared to previous scheduling approaches and each scheme shows performance improvement of 7.3% and 1.9%, respectively, compared to prior research.

Parallel Algorithms for Finding δ-approximate Periods and γ-approximate Periods of Strings over Integer Alphabets

Youngho Kim, Jeong Seop Sim

http://doi.org/10.5626/JOK.2017.44.8.760

Repetitive strings have been studied in diverse fields such as data compression, bioinformatics and so on. Recently, two problems of approximate periods of strings over integer alphabets were introduced, finding minimum δ-approximate periods and finding minimum γ-approximate periods. Both problems can be solved in O(n²) time when n is the length of the string. In this paper, we present two parallel algorithms for solving the above two problems in O(n²) time using O(n²) threads, respectively. The experimental results show that our parallel algorithms for finding minimum δ-approximate (resp. γ-approximate) periods run approximately 19.7 (resp. 40.08) times faster than the sequential algorithms when n = 10,000.

Parallel Range Query Processing with R-tree on Multi-GPUs

Hongsu Ryu, Mincheol Kim, Wonik Choi

http://doi.org/

Ever since the R-tree was proposed to index multi-dimensional data, many efforts have been made to improve its query performances. One common trend to improve query performance is to parallelize query processing with the use of multi-core architectures. To this end, a GPU-base R-tree has been recently proposed. However, even though a GPU-based R-tree can exhibit an improvement in query performance, it is limited in its ability to handle large volumes of data because GPUs have limited physical memory. To address this problem, we propose MGR-tree (Multi-GPU R-tree), which can manage large volumes of data by dividing nodes into multiple GPUs. Our experiments show that MGR-tree is up to 9.1 times faster than a sequential search on a GPU and up to 1.6 times faster than a conventional GPU-based R-tree.

Parallel Algorithms for Finding Consensus of Circular Strings

Dong Hee Kim, Jeong Seop Sim

http://doi.org/

The consensus problem is finding a representative string, called a consensus, of a given set S of k strings. Circular strings are different from linear strings in that the last symbol precedes the first symbol. Given a set S of circular strings of length n over an alphabet ∑ , we first present an O(|∑|nlogn) time parallel algorithm for finding a consensus of S minimizing both radius and distance sum when k=3 using O(n) threads. Then we present an O(|∑|n²logn) time parallel algorithm for finding a consensus of S minimizing distance sum when k=4 using O(n) threads. Finally, we compare execution times of our algorithms implemented using CUDA with corresponding sequential algorithms.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr