Digital Library[ Search Result ]
A Data Imbalance Minimization Strategy for Scalable Deep Learning Training
Sanha Maeng, Euhyun Moon, Sungyong Park
http://doi.org/10.5626/JOK.2023.50.10.836
As deep neural network training is compute-intensive and takes a very long time, distributed training using clusters with multiple graphics processing units (GPUs) has been widely adopted. The distributed training of deep neural networks is severely slowed due to straggler, i.e., the slowest worker. Hence, previous studies have proposed solutions to the straggler problem. The existing approaches assume that all data samples, such as images, have a constant size, and they do not recognize data imbalance issues, caused by data samples with different sizes, such as videos and audios, while solving the straggler problem. In this paper, we propose a data imbalance minimization (DIM) strategy that considers data imbalance problems to solve the straggler problem caused by imbalanced data samples. Our evaluation on eight NVIDIA Tesla T4 GPUs shows that DIM outperforms the state-of-the-art systems by up to 1.77x speedup with comparable scalability.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr