Exploring adversarial attacks and defenses

Deep Learning classifiers are capable of an outstanding performance. Yet, they are vulnera ble to adversarial attacks, i.e. it is possible to craft a slightly modified version of a correctly classified image that, although its contents are still clearly recognisable to a human being, the classifier outputs an incorrect classification. In this thesis we evaluate the effectiveness of adversarial attacks, namely their trans ferability to other models, and some proposed defenses. Transferability occurs when an adversarial sample is crafted with a model, and it succeeds in achieving a misclassification in another model. To make this study as comprehensive as possible, we explore several attack methods, namely: Fast Gradient Sign Method (FGSM), Deepfool, Jacobian Saliency Map Attack (JSMA), Carlini, Projected Gradient Descent (PGD) and Few Pixels. To evaluate the impact of the model’s architecture in the transferability rate we use sev eral common architectures: VGG16, three ResNet with different depths, and a small Con volution Neural Network. Two common datasets were used for evaluation: CIFAR-10 and German Traffic Sign Recognition Benchmark (GTSRB). Different attack methods use different approaches and parameters to craft adversarial samples. Hence, it is not trivial to control the degree of perturbation. To be able to achieve the same level of perturbation with every method we resorted to an image comparison metric: Structural Similarity Index Measure (SSIM). For each method we performed a search within its parameter space to find the parameters that on average attain a specific level of perturbation. To evaluate the impact of the level of perturbation on transferability rates, we evaluate two different values for the SSIM metric. Our results show that while it is possible to craft an adversarial sample in a particular model, the transferability rates vary considerably from method to method. Regarding defensive methods we explored Adversarial Training and Defensive Distilla tion. The results show that the ability to prevent an adversarial attack, or robustness, varies significantly depending on the conditions that the attack is performed and on the defensive methods used. Furthermore, there is a trade-off between robustness and accuracy, with defensive models having lower accuracy than non-defended models.

Country

Portugal

Related Organizations

University of Minho
Portugal

Keywords

DeepFool, PGD, Adversarial attacks, Defensive distillation, Carlini, JSMA, GTSRB, SSIM, FGSM, CIFAR-10, Adversarial training, Ataques adversariais, Defesa via destilação, Treino adversarial

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green