Self-Supervised Learning of Video-Induced Visual Invariances

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jun 2020Publisher:IEEEJournal:2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Authors: Michael Tschannen; Josip Djolonga; Marvin Ritter; Aravindh Mahendran; Neil Houlsby; Sylvain Gelly; Mario Lucic;

doi: 10.1109/cvpr42600.2020.01382

Self-Supervised Learning of Video-Induced Visual Invariances

- Summary
- Related research
  (8)
- Metrics

Abstract

We propose a general framework for self-supervised learning of transferable visual representations based on Video-Induced Visual Invariances (VIVI). We consider the implicit hierarchy present in the videos and make use of (i) frame-level invariances (e.g. stability to color and contrast perturbations), (ii) shot/clip-level invariances (e.g. robustness to changes in object orientation and lighting conditions), and (iii) video-level invariances (semantic relationships of scenes across shots/clips), to define a holistic self-supervised loss. Training models using different variants of the proposed framework on videos from the YouTube-8M (YT8M) data set, we obtain state-of-the-art self-supervised transfer learning results on the 19 diverse downstream tasks of the Visual Task Adaptation Benchmark (VTAB), using only 1000 labels per task. We then show how to co-train our models jointly with labeled images, outperforming an ImageNet-pretrained ResNet-50 by 0.8 points with 10× fewer labeled images, as well as the previous best supervised model by 3.7 points using the full ImageNet data set.

Related Organizations

Google (United States)
United States

8 Research products, page 1 of 1

Video-Induced Tourism in Central Portugal: Production and Impact of Promotional Videos
2022IsAmongTopNSimilarDocuments
Area-dependent time courses of brain activation during video-induced symptom provocation in social anxiety disorder
2014IsAmongTopNSimilarDocuments
Evaluating the Cognitive Effects of Video-Induced Negative Affect in College Students: A Comparative Study between Acute Exercise and Music Listening
2023IsAmongTopNSimilarDocuments
Investigating the Influence of Personal Memories on Video-Induced Emotions
2020IsAmongTopNSimilarDocuments
Improve the generalization of the cross-task emotion classifier using EEG based on feature selection and SVR
2019IsAmongTopNSimilarDocuments
Self-Supervised Learning of Video-Induced Visual Invariances
2019IsAmongTopNSimilarDocuments
High-Frequency Electroencephalographic Activity in Left Temporal Area Is Associated with Pleasant Emotion Induced by Video Clips
2015IsAmongTopNSimilarDocuments
caffe-tensorflow software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	23
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%