<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Auditing Visualizations: Transparency Methods Struggle to Detect Anomalous Behavior

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2022Embargo end date: 01 Jan 2022Publisher:arXivFunded by:NSF | Collaborative Research: T...

Authors: Denain, Jean-Stanislas; Steinhardt, Jacob;

doi: 10.48550/arxiv.2206.13498 , 10.5281/zenodo.7982683 , 10.5281/zenodo.6728368 , 10.5281/zenodo.6728369

arXiv: http://arxiv.org/abs/2206.13498

Auditing Visualizations: Transparency Methods Struggle to Detect Anomalous Behavior

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Model visualizations provide information that outputs alone might miss. But can we trust that model visualizations reflect model behavior? For instance, can they diagnose abnormal behavior such as planted backdoors or overregularization? To evaluate visualization methods, we test whether they assign different visualizations to anomalously trained models and normal models. We find that while existing methods can detect models with starkly anomalous behavior, they struggle to identify more subtle anomalies. Moreover, they often fail to recognize the inputs that induce anomalous behavior, e.g. images containing a spurious cue. These results reveal blind spots and limitations of some popular model visualizations. By introducing a novel evaluation framework for visualizations, our work paves the way for developing more reliable model transparency methods in the future.

Fixed backdoor localization results, made changes to abstract and introduction

Related Organizations

University of California, Berkeley
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computer Science - Neural and Evolutionary Computing, Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)

3 Research products, page 1 of 1

lucid software on GitHub
IsRelatedTo
robustness software on GitHub
IsRelatedTo
auditing-vis software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average