Digital Library[ Search Result ]
Pruning Deep Neural Networks Neurons for Improved Robustness against Adversarial Examples
Gyumin Lim, Gihyuk Ko, Suyoung Lee, Sooel Son
http://doi.org/10.5626/JOK.2023.50.7.588
Deep Neural Networks (DNNs) have a security vulnerability to adversarial examples, which can result in incorrect classification of the DNNs results. In this paper, we assume that the activation patterns of DNNs will differ between normal data and adversarial examples. We propose a revision that prunes neurons that are activated only in the adversarial examples but not in the normal data, by identifying such neurons in the DNNs. We conducted adversarial revision using various adversarial examples generation techniques and used MNIST and CIFAR-10 datasets. The DNNs neurons that were pruned using the MNIST datasets achieved adversarial revision performance that increased up to 100% and 70.20% depending on the pruning method (label-wise and all-label pruning) while maintaining classification accuracy of normal data at above 99%. In contrast, the CIFAR-10 datasets showed a decreased classification accuracy for normal data, but the adversarial revision performance increased up to 99.37% and 47.61% depending on the pruning method. In addition, the efficiency of the proposed pruning-based adversarial revision performance was confirmed through a comparative analysis with adversarial training methods.
Generating Counterfactual Examples through Generating Adversarial Examples
http://doi.org/10.5626/JOK.2022.49.12.1132
The advance of artificial intelligence (AI) has brought numerous conveniences. However, the complex structure of AI models makes it challenging to understand the inner working of AI. Counterfactual explanation is a method using counterfactual examples, in which minimum perceptible perturbations are applied to change classification results, to explain AI. Adversarial examples are data modified for causing AI models to misclassify the data. Unlike counterfactual examples, perturbations applied to adversarial examples are difficult for humans to perceive. In a simple model, generating adversarial examples is similar to generating counterfactual examples. In contrast, it is different in deep learning because the cognitive difference between humans and deep learning models is often huge. Nevertheless, we confirmed that adversarial examples generated by certain deep learning models were similar to counterfactual examples. In this paper, we analyzed the structure and conditions of deep learning models in which adversarial examples were similar to counterfactual examples. We also proposed a new metric, partial concentrated change (PCC), and compared adversarial examples generated from different models using existing metrics and the proposed PCC.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr