
doi: 10.1111/cgf.15109
AbstractExplaining Machine Learning (ML) — and especially Deep Learning (DL) — classifiers' decisions is a subject of interest across fields due to the increasing ubiquity of such models in computing systems. As models get increasingly complex, relying on sophisticated machinery to recognize data patterns, explaining their behavior becomes more difficult. Directly visualizing classifier behavior is in general infeasible, as they create partitions of the data space, which is typically high dimensional. In recent years, Decision Boundary Maps (DBMs) have been developed, taking advantage of projection and inverse projection techniques. By being able to map 2D points back to the data space and subsequently run a classifier, DBMs represent a slice of classifier outputs. However, we recognize that DBMs without additional explanatory views are limited in their applicability. In this work, we propose augmenting the naive DBM generating process with views that provide more in‐depth information about classifier behavior, such as whether the training procedure is locally stable. We describe our proposed views — which we term Differentiable Decision Boundary Maps — over a running example, explaining how our work enables drawing new and useful conclusions from these dense maps. We further demonstrate the value of these conclusions by showing how useful they would be in carrying out or preventing a dataset poisoning attack. We thus provide evidence of the ability of our proposed views to make DBMs significantly more trustworthy and interpretable, increasing their utility as a model understanding tool.
CCS Concepts, • Mathematics of computing → Dimensionality reduction, • Human-centered computing → Visualization techniques, • Computing methodologies → Machine learning
CCS Concepts, • Mathematics of computing → Dimensionality reduction, • Human-centered computing → Visualization techniques, • Computing methodologies → Machine learning
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
