Relational Knowledge Distillation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jun 2019Embargo end date: 01 Jan 2019Publisher:IEEEJournal:2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Authors: Wonpyo Park; Dongju Kim; Yan Lu; Minsu Cho;

doi: 10.1109/cvpr.2019.00409 , 10.48550/arxiv.1904.05068

arXiv: 1904.05068

Relational Knowledge Distillation

- Summary
- Subjects
- Metrics

Abstract

Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers' performance, achieving the state of the arts on standard benchmark datasets.

Accepted to CVPR 2019

Related Organizations

Microsoft (United States)
United States
Microsoft Research Asia (China)
China (People's Republic of)
Pohang University of Science and Technology
Korea (Republic of)

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	792
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.01%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 0.1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 0.1%