Accelerating DNN Models via Hierarchical N:M Sparsity 


Vol. 51,  No. 7, pp. 583-591, Jul.  2024
10.5626/JOK.2024.51.7.583


PDF

  Abstract

N:M sparsity pruning is an effective approach for compressing deep neural networks by leveraging NVIDIA’s Sparse Tensor Core technology. Despite its effectiveness, this technique is constrained by hardware limitations, leading to fixed compression ratios and increased access to unnecessary input data, and does not adequately address the imbalanced distribution of essential parameters. This paper proposes Hierarchical N:M (HiNM) sparsity, where vector sparsity is applied prior to N:M sparsity for various-levels of sparsity. We also introduce a novel permutation technique tailored for HiNM sparsity, named 2-axis channel permutation (2CP). The experimental results showed that HiNM sparsity achieves a compression ratio twice that of traditional N:M sparsity while reducing latency by an average of 37%.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

S. Yu, H. Lee, D. Shin, "Accelerating DNN Models via Hierarchical N:M Sparsity," Journal of KIISE, JOK, vol. 51, no. 7, pp. 583-591, 2024. DOI: 10.5626/JOK.2024.51.7.583.


[ACM Style]

Seungmin Yu, Hayun Lee, and Dongkun Shin. 2024. Accelerating DNN Models via Hierarchical N:M Sparsity. Journal of KIISE, JOK, 51, 7, (2024), 583-591. DOI: 10.5626/JOK.2024.51.7.583.


[KCI Style]

유승민, 이하윤, 신동군, "Hierarchical N:M Sparsity를 통한 DNN 모델 가속화," 한국정보과학회 논문지, 제51권, 제7호, 583~591쪽, 2024. DOI: 10.5626/JOK.2024.51.7.583.


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr