Search : [ author: Hayun Lee ] (4)

Accelerating DNN Models via Hierarchical N:M Sparsity

Seungmin Yu, Hayun Lee, Dongkun Shin

http://doi.org/10.5626/JOK.2024.51.7.583

N:M sparsity pruning is an effective approach for compressing deep neural networks by leveraging NVIDIA’s Sparse Tensor Core technology. Despite its effectiveness, this technique is constrained by hardware limitations, leading to fixed compression ratios and increased access to unnecessary input data, and does not adequately address the imbalanced distribution of essential parameters. This paper proposes Hierarchical N:M (HiNM) sparsity, where vector sparsity is applied prior to N:M sparsity for various-levels of sparsity. We also introduce a novel permutation technique tailored for HiNM sparsity, named 2-axis channel permutation (2CP). The experimental results showed that HiNM sparsity achieves a compression ratio twice that of traditional N:M sparsity while reducing latency by an average of 37%.

Optimizing Computation of Tensor-Train Decomposed Embedding Layer

Seungmin Yu, Hayun Lee, Dongkun Shin

http://doi.org/10.5626/JOK.2023.50.9.729

Personalized recommendation system is ubiquitous in daily life. However, the huge amount of memory requirement to store the embedding tables used by deep learning-based recommendation system models is taking up most of the resources of industrial AI data centers. To overcome this problem, one of the solutions is to use Tensor-Train (TT) decomposition, is promising compression technique in deep neural network. In this study, we analyze unnecessary computations in Tensor-Train Gather and Reduce (TT-GnR) which is the operation of embedding layer applied with TT decomposition. To solve this problem, we define a computational unit called group to bind the item vectors into a group and propose Group Reduced TT-Gather and Reduce operation to reduce unnecessary operations by calculating with groups. Since the GRT-GnR operation is calculated in groups, computational cost varies depending on how item vectors are grouped. Experimental results showed that the GRT-GnR operation had a 41% decrease in latency compared to conventional TT-GnR operation.

Code Generation and Data Layout Transformation Techniques for Processing-in-Memory

Hayun Lee, Gyungmo Kim, Dongkun Shin

http://doi.org/10.5626/JOK.2023.50.8.639

Processing-in-Memory (PIM) capitalizes on internal parallelism and bandwidth within memory systems, thereby achieving superior performance to CPUs or GPUs in memory-intensive operations. Although many PIM architectures were proposed, the compiler issues for PIM are not currently well-studied. To generate efficient program codes for PIM devices, the PIM compiler must optimize operation schedules and data layouts. Additionally, the register reuse of PIM processing units must be maximized to reduce data movement traffic between host and PIM devices. We propose a PIM compiler, which can support various PIM architectures. It achieves up to 2.49 times performance improvement in GEMV operations through register reuse optimization.

Performance and Energy Comparison of Different BLAS and Neural Network Libraries for Efficient Deep Learning Inference on ARM-based IoT Devices

Hayun Lee, Dongkun Shin

http://doi.org/10.5626/JOK.2019.46.3.219

Cloud computing is generally used to perform deep learning on IoT devices. However, its application is associated with limitations such as connection instability, energy consumption for communication, and security vulnerabilities. To solve such problems, recent attempts at performing deep learning within IoT devices have occurred. These attempts mainly suggest either lightweight deep learning models or compression techniques concerning IoT devices, but they lack analysis of the effect when it is performed in actual IoT devices. Since each IoT device has different configuration of processing units and supported libraries, it is necessary to analyze various execution environments in each IoT device in order to perform optimized deep learning. In this study, performance and energy of IoT devices with various hardware configurations were measured and analyzed according to the application of the deep learning model, library, and compression technique. It was established that utilizing the appropriate libraries improve both speed and energy efficiency up to 13.3 times and 48.5 times, respectively.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr