TY  - JOUR
T1  - PGB: Permutation and Grouping for BERT Pruning
AU  - Lim, Hye-Min 
AU  - Choi, Dong-Wan 
JO  - Journal of KIISE, JOK
PY  - 2023
DA  - 2023/1/14
DO  - 10.5626/JOK.2023.50.6.503
KW  - BERT compression
KW  - task-specific pruning
KW  - structured pruning
KW  - head pruning
AB  - Recently, pre-trained Transformer-based models have been actively used for various artificial intelligence tasks, such as natural language processing and image recognition. However, these models have billions of parameters, which require significant computation for inference, and may be subject to many limitations for use in resource-limited environments. To address this problem, we propose PGB(Permutation Grouped BERT pruning), a new group-based structured pruning method for Transformer models. PGB effectively finds a way to change the optimal attention order according to resource constraints, and prunes unnecessary heads based on the importance of the heads to minimize the information loss in the model. Through various comparison experiments, PGB shows better performance in terms of inference speed and accuracy loss than the other existing structured pruning methods for the pre-trained BERT model.