Decoupled Kullback-Leibler Divergence Loss

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2024Embargo end date: 01 Jan 2023Publisher:Neural Information Processing Systems Foundation, Inc. (NeurIPS)Journal:Advances in Neural Information Processing Systems 37

Authors: Jiequan Cui; Zhuotao Tian; Zhisheng Zhong; Xiaojuan Qi 0001; Bei Yu 0001; Hanwang Zhang;

doi: 10.52202/079017-2370 , 10.48550/arxiv.2305.13948

arXiv: 2305.13948

Decoupled Kullback-Leibler Divergence Loss

- Summary
- Subjects
- Metrics

Abstract

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels. Thanks to the decomposed formulation of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL/DKL in scenarios like knowledge distillation by breaking its asymmetric optimization property. This modification ensures that the $\mathbf{w}$MSE component is always effective during training, providing extra constructive cues. Secondly, we introduce class-wise global information into KL/DKL to mitigate bias from individual samples. With these two enhancements, we derive the Improved Kullback-Leibler (IKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100 and ImageNet datasets, focusing on adversarial training, and knowledge distillation tasks. The proposed approach achieves new state-of-the-art adversarial robustness on the public leaderboard -- RobustBench and competitive performance on knowledge distillation, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL.

NeurIPS 2024

Related Organizations

Chinese University of Hong Kong
China (People's Republic of)
The University of Hong Kong
The Chinese University of Hong kong
Hong Kong
Nanyang Technological University
Singapore
The University of Hong Kong

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	30
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%

Found an issue? Give us feedback

30

Top 10%

Top 1%

Green