Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Nov 2024Embargo end date: 01 Jan 2022Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 46, pages 7,283-7,299 (issn: 0162-8828, eissn: 1939-3539,

Copyright policy )

Authors: Pattarawat Chormai; Jan Herrmann; Klaus-Robert Müller; Grégoire Montavon;

doi: 10.1109/tpami.2024.3388275 , 10.48550/arxiv.2212.14855 , 10.14279/depositonce-20480

pmid: 38607718

arXiv: 2212.14855

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions. Explanations often take the form of a heatmap identifying input features (e.g. pixels) that are relevant to the model's decision. These explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by extracting at some intermediate layer of a neural network, subspaces that capture the multiple and distinct activation patterns (e.g. visual concepts) that are relevant to the prediction. To automatically extract these subspaces, we propose two new analyses, extending principles found in PCA or ICA to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), maximize relevance instead of e.g. variance or kurtosis. This allows for a much stronger focus of the analysis on what the ML model actually uses for predicting, ignoring activations or concepts to which the model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.

17 pages + supplement

Related Organizations

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, 000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::000 Informatik, Informationswissenschaft, allgemeine Werke, disentangled representations, Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, neural networks, explainable AI, Machine Learning (cs.LG), Artificial Intelligence (cs.AI), subspace analysis

3 Research products, page 1 of 1

vision software on GitHub
IsRelatedTo
pytorch-image-models software on GitHub
IsRelatedTo
NetDissect-Lite software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average