DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data

Name: DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data
Creator: Popovic, Dorde

integration_instructionsResearch softwarekeyboard_double_arrow_right Software 25 Jan 2025Publisher:Zenodo

Authors: Popovic, Dorde;

doi: 10.5281/zenodo.14738587 , 10.5281/zenodo.14738586

DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data

- Summary
- Metrics

Abstract

Backdoor attacks are among the most effective, practical, and stealthy attacks in deep learning. We consider a practical scenario where a developer obtains a deep model from a third party and uses it as part of a safety-critical system. The developer wants to inspect the model for potential backdoors prior to system deployment. We find that most existing detection techniques make assumptions that are not applicable to this scenario. DeBackdoor is a novel framework for detecting backdoors under realistic restrictions. We generate candidate triggers by deductively searching over the space of possible triggers. We construct and optimize a smoothed version of Attack Success Rate as our search objective. Starting from a broad class of template attacks and just using the forward pass of a deep model, we reverse engineer the backdoor attack. We conduct extensive evaluation on a wide range of attacks, models, and datasets, with our technique performing almost perfectly across these settings.

Related Organizations

Hamad bin Khalifa University
Qatar

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average