Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Universidade do Minh...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
addClaim

Exploring adversarial attacks and defenses

Authors: Branco, Tiago Manuel Sampaio;

Exploring adversarial attacks and defenses

Abstract

Deep Learning classifiers are capable of an outstanding performance. Yet, they are vulnera ble to adversarial attacks, i.e. it is possible to craft a slightly modified version of a correctly classified image that, although its contents are still clearly recognisable to a human being, the classifier outputs an incorrect classification. In this thesis we evaluate the effectiveness of adversarial attacks, namely their trans ferability to other models, and some proposed defenses. Transferability occurs when an adversarial sample is crafted with a model, and it succeeds in achieving a misclassification in another model. To make this study as comprehensive as possible, we explore several attack methods, namely: Fast Gradient Sign Method (FGSM), Deepfool, Jacobian Saliency Map Attack (JSMA), Carlini, Projected Gradient Descent (PGD) and Few Pixels. To evaluate the impact of the model’s architecture in the transferability rate we use sev eral common architectures: VGG16, three ResNet with different depths, and a small Con volution Neural Network. Two common datasets were used for evaluation: CIFAR-10 and German Traffic Sign Recognition Benchmark (GTSRB). Different attack methods use different approaches and parameters to craft adversarial samples. Hence, it is not trivial to control the degree of perturbation. To be able to achieve the same level of perturbation with every method we resorted to an image comparison metric: Structural Similarity Index Measure (SSIM). For each method we performed a search within its parameter space to find the parameters that on average attain a specific level of perturbation. To evaluate the impact of the level of perturbation on transferability rates, we evaluate two different values for the SSIM metric. Our results show that while it is possible to craft an adversarial sample in a particular model, the transferability rates vary considerably from method to method. Regarding defensive methods we explored Adversarial Training and Defensive Distilla tion. The results show that the ability to prevent an adversarial attack, or robustness, varies significantly depending on the conditions that the attack is performed and on the defensive methods used. Furthermore, there is a trade-off between robustness and accuracy, with defensive models having lower accuracy than non-defended models.

Country
Portugal
Related Organizations
Keywords

DeepFool, PGD, Adversarial attacks, Defensive distillation, Carlini, JSMA, GTSRB, SSIM, FGSM, CIFAR-10, Adversarial training, Ataques adversariais, Defesa via destilação, Treino adversarial

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green