Adversarial Attacks to Classification Systems

Name: Adversarial Attacks to Classification Systems
Creator: Leal, João Miguel Gouveia
Keywords: Métricas de Desempenho, Deep Learning, Aprendizagem Adversarial, Performance Metrics, Robustez, Robustness, Ataques Adversariais, Adversarial Learning, Adversarial Attacks

Leal, João Miguel Gouveia

Found an issue? Give us feedback

Estudo Geralarrow_drop_down

Estudo Geral

Master thesis . 2022

Data sources: Estudo Geral

Adversarial Attacks to Classification Systems

descriptionPublicationkeyboard_double_arrow_right Master thesis 09 Sep 2022 Portugal English

Authors: Leal, João Miguel Gouveia;

handle: 10316/102193

Adversarial Attacks to Classification Systems

- Summary
- Subjects
- Metrics

Abstract

Amostras adversariais são inputs corrompidos com perturbações pouco visíveis, que são classificadas incorretamente por um determinado modelo alvo. Adversários criam amostras adversariais a partir de vários métodos que dependem da informação disponível sobre o sistema alvo. Num cenário white-box, os adversários tem acesso completo ao modelo enquanto que, num cenario black-box, apenas a camada mais externa do modelo está disponível. Investigadores têm desenvolvido amostras adversariais que são capazes de enganar modelos alvo mesmo quando o adversário tem quantidade mínima de informação sobre o sistema alvo do ataque. De forma a construir modelos que sejam robustos a amostras adversariais, vários autores propuseram defesas adversariais que são mecanismos com o objetivo de proteger os modelos de deep learning de ataques adversariais. No entanto, tem sido demonstrado que estas defesas falham o que indica que construir modelos robustos é uma tarefa extremamente complexa. Motivados por isto, várias ferramentas têm sido desenvolvidas que agrupam vários ataques adversariais de forma a permitir a utilizadores testarem os seus modelos, no entanto nenhuma ferramenta oferece um sistema de pipeline e a informação que dão sobre a robustez dos modelos testados é escassa. Para além disto, várias ferramentas deixaram de receber suporte o que acaba por levar a ferramentas com ataques antigos e com ataques semelhantes entre elas. Nesta dissertação, uma nova ferramenta foi desenvolvida com um mecanismo de pipeline que permite aos utilizadores introduzirem os seus modelos e escolherem, dos oito ataques atualmente suportados, aqueles que desejam usar na execução da pipeline. Após a execução da pipeline, cada modelo obtém uma pontuação baseada no desempenho que teve perante todas as imagens adversariais geradas pelos ataques adversariais de forma a permitir uma melhor compreensão da robustez do modelo. Com o intuito de testar a validez e as capacidades da ferramenta, foi realizada uma experiência com o mecanismo de pipeline, modelos treinados a partir de um dataset de classificação de imagens e dos oito ataques adversariais suportados. Os resultados permitiram compreender melhor a robustez dos modelos. A avaliação de um modelo não deverá ser baseada apenas na exatidão perante as amostras adversariais, mas também deverá considerar a perturbação que uma amostra necessita de possuir de forma a que seja capaz de enganar o modelo alvo.

Adversarial samples are inputs corrupted with inconspicuous perturbations misclassified by a given target model. Adversaries create adversarial samples using various methods that depend on the information available about the target system. In a white-box scenario, adversaries have full access to the model, and in a black-box scenario, usually, only the output layer is accessible. Researchers have developed adversarial samples that can fool target models even when the adversary has almost no information about the target system. To construct classifiers robust to adversarial samples, many authors have proposed adversarial defenses, mechanisms intended to protect deep learning models from adversarial attacks. However, many of these defenses have been shown to fail, which asserts that building robust models is an extremely arduous and complicated task to achieve. Motivated by this, there have been developed frameworks that group various adversarial attacks to allow users to test their models, however, none of them provide a pipeline mechanism and lack enough information about the robustness of the tested models. Various frameworks have also stopped receiving support, leading to frameworks with antiquated attacks and similar attacks between them. In this dissertation, a new framework was developed with a pipeline mechanism that allows users to input their models and to choose from the currently, eight adversarial attacks. After executing the pipeline, each model obtains a score based on its performance against all of the images generated by the adversarial attacks allowing for a better understanding of the robust levels of those same models. To test the validity and capabilities of the framework, an experiment was performed using the pipeline mechanism with models trained using an image classification dataset and the eight supported adversarial attacks. The results obtained allow for a deeper understanding of the robustness of the models. The evaluation of a model shouldn't be based only on the accuracy of the model on the adversarial samples but should take into consideration the amount of perturbation that a sample needs to have to be able to fool the target classifier.

Outro - Projeto confinanciado por COMPETE 2020 e pela União Europeia. Referência do projeto: POCI-01-0247-FEDER-046969

Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia

Country

Portugal

Related Organizations

University of Coimbra
Portugal

Keywords

Métricas de Desempenho, Deep Learning, Aprendizagem Adversarial, Performance Metrics, Robustez, Robustness, Ataques Adversariais, Adversarial Learning, Adversarial Attacks

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green