Prevalence of neural collapse during the terminal phase of deep learning training

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Preprint 21 Sep 2020Embargo end date: 01 Jan 2020 English Publisher:Proceedings of the National Academy of SciencesJournal:Proceedings of the National Academy of Sciences, volume 117, pages 24,652-24,663 (issn: 0027-8424, eissn: 1091-6490,

Copyright policy )Funded by:NSF | Properties of Approximate..., NSF | Estimation and testing in..., NSF | "Big-Data" Asymptotics: T...

Authors: Vardan Papyan; X. Y. Han; David L. Donoho;

doi: 10.1073/pnas.2015509117 , 10.48550/arxiv.2008.08186

pmid: 32958680

pmc: PMC7547234

arXiv: 2008.08186

Prevalence of neural collapse during the terminal phase of deep learning training

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Significance Modern deep neural networks for image classification have achieved superhuman performance. Yet, the complex details of trained networks have forced most practitioners and researchers to regard them as black boxes with little that could be understood. This paper considers in detail a now-standard training methodology: driving the cross-entropy loss to zero, continuing long after the classification error is already zero. Applying this methodology to an authoritative collection of standard deepnets and datasets, we observe the emergence of a simple and highly symmetric geometry of the deepnet features and of the deepnet classifier, and we document important benefits that the geometry conveys—thereby helping us understand an important component of the modern deep learning training paradigm.

Related Organizations

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML), Machine Learning (cs.LG), Statistics - Machine Learning, Physical Sciences, Artificial neural networks and deep learning

3 Research products, page 1 of 1

Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data
2023IsAmongTopNSimilarDocuments
The Influence of Teacher-Individualizing Practices on Child Developmental Progress
1999IsAmongTopNSimilarDocuments
clusterjob software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	225
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%