Digital Library[ Search Result ]
Pruning Deep Neural Networks Neurons for Improved Robustness against Adversarial Examples
Gyumin Lim, Gihyuk Ko, Suyoung Lee, Sooel Son
http://doi.org/10.5626/JOK.2023.50.7.588
Deep Neural Networks (DNNs) have a security vulnerability to adversarial examples, which can result in incorrect classification of the DNNs results. In this paper, we assume that the activation patterns of DNNs will differ between normal data and adversarial examples. We propose a revision that prunes neurons that are activated only in the adversarial examples but not in the normal data, by identifying such neurons in the DNNs. We conducted adversarial revision using various adversarial examples generation techniques and used MNIST and CIFAR-10 datasets. The DNNs neurons that were pruned using the MNIST datasets achieved adversarial revision performance that increased up to 100% and 70.20% depending on the pruning method (label-wise and all-label pruning) while maintaining classification accuracy of normal data at above 99%. In contrast, the CIFAR-10 datasets showed a decreased classification accuracy for normal data, but the adversarial revision performance increased up to 99.37% and 47.61% depending on the pruning method. In addition, the efficiency of the proposed pruning-based adversarial revision performance was confirmed through a comparative analysis with adversarial training methods.
Survey on Feature Attribution Methods in Explainable AI
Gihyuk Ko, Gyumin Lim, Homook Cho
http://doi.org/10.5626/JOK.2020.47.12.1181
As artificial intelligence (AI)-based technologies are increasingly being used in areas that can have big socioeconomic effects, there is a growing effort to explain decisions made by AI models. One important direction in such eXplainable AI (XAI) is the ‘feature attribution’ method, which explains AI models by assigning a contribution score to each input feature. In this work, we surveyed nine recently developed feature attribution methods and categorized them using four different criteria. Based on the categorizations, we found that the current methods focused only on specific settings such as generating local, white-box explanations of neural networks and lacked theoretical foundations such as axiomatic definitions. We suggest future research directions toward a unified feature attribution method based on our findings.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr