Compressing Features for Learning With Noisy Labels

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Feb 2024Embargo end date: 01 Jan 2022Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Neural Networks and Learning Systems, volume 35, pages 2,124-2,138 (issn: 2162-237X, eissn: 2162-2388,

Copyright policy )Funded by:EC | E-DUALITY

Authors: Chen, Yingyi; Hu, Shell Xu; Shen, Xi; Ai, Chunrong; Suykens, Johan A.K.;

doi: 10.1109/tnnls.2022.3186930 , 10.48550/arxiv.2206.13140

pmid: 35802546

arXiv: 2206.13140

Compressing Features for Learning With Noisy Labels

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets. In this paper, we focus on the problem of learning with noisy labels and introduce compression inductive bias to network architectures to alleviate this over-fitting problem. More precisely, we revisit one classical regularization named Dropout and its variant Nested Dropout. Dropout can serve as a compression constraint for its feature dropping mechanism, while Nested Dropout further learns ordered feature representations w.r.t. feature importance. Moreover, the trained models with compression regularization are further combined with Co-teaching for performance boost. Theoretically, we conduct bias-variance decomposition of the objective function under compression regularization. We analyze it for both single model and Co-teaching. This decomposition provides three insights: (i) it shows that over-fitting is indeed an issue for learning with noisy labels; (ii) through an information bottleneck formulation, it explains why the proposed feature compression helps in combating label noise; (iii) it gives explanations on the performance boost brought by incorporating compression regularization into Co-teaching. Experiments show that our simple approach can have comparable or even better performance than the state-of-the-art methods on benchmarks with real-world label noise including Clothing1M and ANIMAL-10N. Our implementation is available at https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/.

Accepted to TNNLS 2022. Project page: https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/

Related Organizations

KU Leuven
Belgium
Chinese University of Hong Kong
China (People's Republic of)
Samsung (South Korea)
Korea (Republic of)
Samsung (United Kingdom)
United Kingdom

Keywords

FOS: Computer and information sciences, Technology, Computer Science - Machine Learning, Noise measurement, STADIUS-21-143, Principal component analysis, label noise, Machine Learning (stat.ML), information sorting, Computer Science, Artificial Intelligence, Machine Learning (cs.LG), Engineering, Computer Science, Theory & Methods, Statistics - Machine Learning, Training, Computer Science, Hardware & Architecture, Benchmark testing, Science & Technology, Biological system modeling, deep learning, Engineering, Electrical & Electronic, Deep learning, compression, Kernel, Computer Science, Bias variance decomposition

3 Research products, page 1 of 1

SSR: An Efficient and Robust Framework for Learning with Unknown Label Noise
2021IsAmongTopNSimilarDocuments
Boosting Co-teaching with Compression Regularization for Label Noise
2021IsAmongTopNSimilarDocuments
Jigsaw-ViT: Learning jigsaw puzzles in vision transformer
2023IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	11
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%