Grounding Representation Similarity with Statistical Testing

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2021Embargo end date: 01 Jan 2021Publisher:arXivJournal:CoRR, volume abs/2108.01661Funded by:NSF | Collaborative Research: T..., NSF | Graduate Research Fellows...

Authors: Ding, Frances; Jean-Stanislas Denain; Steinhardt, Jacob;

doi: 10.48550/arxiv.2108.01661 , 10.5281/zenodo.5117844 , 10.5281/zenodo.5117843

arXiv: 2108.01661

Grounding Representation Similarity with Statistical Testing

- Summary
- Subjects
- Related research
  (6)
- Metrics

Abstract

To understand neural network behavior, recent works quantitatively compare different networks' learned representations using canonical correlation analysis (CCA), centered kernel alignment (CKA), and other dissimilarity measures. Unfortunately, these widely used measures often disagree on fundamental observations, such as whether deep networks differing only in random initialization learn similar representations. These disagreements raise the question: which, if any, of these dissimilarity measures should we believe? We provide a framework to ground this question through a concrete test: measures should have sensitivity to changes that affect functional behavior, and specificity against changes that do not. We quantify this through a variety of functional behaviors including probing accuracy and robustness to distribution shift, and examine changes such as varying random initialization and deleting principal components. We find that current metrics exhibit different weaknesses, note that a classical baseline performs surprisingly well, and highlight settings where all metrics appear to fail, thus providing a challenge set for further improvement.

Accepted at NeurIPS 2021. 10 pages, 3 figures

Related Organizations

University of California, Berkeley
United States
Johns Hopkins University
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)

6 Research products, page 1 of 1

netics software on GitHub
IsRelatedTo
vision software on GitHub
IsRelatedTo
bert software on GitHub
IsRelatedTo
hans software on GitHub
IsRelatedTo
hans software on GitHub
IsRelatedTo
acl2021-instance-level software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average