<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jul 2020Embargo end date: 01 Jan 2020Publisher:IEEEJournal:2020 International Joint Conference on Neural Networks (IJCNN)

Authors: Alejandro Barredo-Arrieta; Javier Del Ser;

doi: 10.1109/ijcnn48605.2020.9206728 , 10.48550/arxiv.2003.11323

arXiv: http://arxiv.org/abs/2003.11323

Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples

- Summary
- Subjects
- Metrics

Abstract

The last decade has witnessed the proliferation of Deep Learning models in many applications, achieving unrivaled levels of predictive performance. Unfortunately, the black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Certain application scenarios have highlighted the importance of assessing the bounds under which Deep Learning models operate, a problem addressed by using assorted approaches aimed at audiences from different domains. However, as the focus of the application is placed more on non-expert users, it results mandatory to provide the means for him/her to trust the model, just like a human gets familiar with a system or process: by understanding the hypothetical circumstances under which it fails. This is indeed the angular stone for this research work: to undertake an adversarial analysis of a Deep Learning model. The proposed framework constructs counterfactual examples by ensuring their plausibility, e.g. there is a reasonable probability that a human could generate them without resorting to a computer program. Therefore, this work must be regarded as valuable auditing exercise of the usable bounds a certain model is constrained within, thereby allowing for a much greater understanding of the capabilities and pitfalls of a model used in a real application. To this end, a Generative Adversarial Network (GAN) and multi-objective heuristics are used to furnish a plausible attack to the audited model, efficiently trading between the confusion of this model, the intensity and plausibility of the generated counterfactual. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.

7 pages, 5 figures. Accepted for its presentation at WCCI 2020

Related Organizations

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer Science - Neural and Evolutionary Computing, Neural and Evolutionary Computing (cs.NE), Cryptography and Security (cs.CR), Machine Learning (cs.LG)

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

Top 10%

Average

Top 10%

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering