Decoupled Knowledge Distillation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jun 2022Embargo end date: 01 Jan 2022Publisher:IEEEJournal:2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Authors: Borui Zhao; Quan Cui; Renjie Song; Yiyu Qiu; Jiajun Liang;

doi: 10.1109/cvpr52688.2022.01165 , 10.48550/arxiv.2203.08679

arXiv: 2203.08679

Decoupled Knowledge Distillation

- Summary
- Subjects
- Metrics

Abstract

State-of-the-art distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is greatly overlooked. To provide a novel viewpoint to study logit distillation, we reformulate the classical KD loss into two parts, i.e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD). We empirically investigate and prove the effects of the two parts: TCKD transfers knowledge concerning the "difficulty" of training samples, while NCKD is the prominent reason why logit distillation works. More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of NCKD and (2) limits the flexibility to balance these two parts. To address these issues, we present Decoupled Knowledge Distillation (DKD), enabling TCKD and NCKD to play their roles more efficiently and flexibly. Compared with complex feature-based methods, our DKD achieves comparable or even better results and has better training efficiency on CIFAR-100, ImageNet, and MS-COCO datasets for image classification and object detection tasks. This paper proves the great potential of logit distillation, and we hope it will be helpful for future research. The code is available at https://github.com/megvii-research/mdistiller.

Accepted by CVPR2022, fix typo

Related Organizations

Tsinghua University
Megvii (China)
China (People's Republic of)
Tsinghua University
Tsinghua University
China (People's Republic of)
Tsinghua University

View all View all

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	341
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 0.01%

Found an issue? Give us feedback

341

Top 0.1%

Top 1%

Top 0.01%

Green

Fields of Science (4) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all