
Background: Most quality assessment approaches for Deep Learning (DL) focus on finding misbehaviourinducing inputs. However, it is difficult to clearly understand the causes of misbehaviours, due to the DL software opaqueness. Recent research proposed different techniques to explain DL misbehaviours, producing input explanations either at a “low level” (raw input elements) or at a “high level” (input features). Aims: We aim to compare the similarity between different explanations and assess to what extent they are understandable. Method: We have conducted an empirical study involving 3 state-of-the-art techniques for DL explanation in 13 configurations, applied to 2 different DL tasks. We have also collected answers from 48 questionnaires submitted to SE experts. Results: Low- and high-level techniques provide dissimilar explanations for the same inputs. However, experts deemed none of the explanations as useful in 28% of the cases. Conclusion: Despite the complementarity of existing explanations, further research is needed to produce better explanations.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
