Self-Supervised Learning of Video-Induced Visual Invariances

Name: Self-Supervised Learning of Video-Induced Visual Invariances
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)

Tschannen, Michael; Djolonga, Josip; Ritter, Marvin; Mahendran, Aravindh; Zhai, Xiaohua; Houlsby, Neil; Gelly, Sylvain; Lucic, Mario

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2019

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2019

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

Self-Supervised Learning of Video-Induced Visual Invariances

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2019Embargo end date: 01 Jan 2019Publisher:arXiv

Authors: Tschannen, Michael; Djolonga, Josip; Ritter, Marvin; Mahendran, Aravindh; Zhai, Xiaohua; Houlsby, Neil; Gelly, Sylvain; +1 Authors

doi: 10.48550/arxiv.1912.02783

arXiv: 1912.02783

Self-Supervised Learning of Video-Induced Visual Invariances

- Summary
- Subjects
- Related research
  (8)
- Metrics

Abstract

We propose a general framework for self-supervised learning of transferable visual representations based on Video-Induced Visual Invariances (VIVI). We consider the implicit hierarchy present in the videos and make use of (i) frame-level invariances (e.g. stability to color and contrast perturbations), (ii) shot/clip-level invariances (e.g. robustness to changes in object orientation and lighting conditions), and (iii) video-level invariances (semantic relationships of scenes across shots/clips), to define a holistic self-supervised loss. Training models using different variants of the proposed framework on videos from the YouTube-8M (YT8M) data set, we obtain state-of-the-art self-supervised transfer learning results on the 19 diverse downstream tasks of the Visual Task Adaptation Benchmark (VTAB), using only 1000 labels per task. We then show how to co-train our models jointly with labeled images, outperforming an ImageNet-pretrained ResNet-50 by 0.8 points with 10x fewer labeled images, as well as the previous best supervised model by 3.7 points using the full ImageNet data set.

CVPR 2020

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)

8 Research products, page 1 of 1

Evaluating the Cognitive Effects of Video-Induced Negative Affect in College Students: A Comparative Study between Acute Exercise and Music Listening
2023IsAmongTopNSimilarDocuments
Area-dependent time courses of brain activation during video-induced symptom provocation in social anxiety disorder
2014IsAmongTopNSimilarDocuments
Improve the generalization of the cross-task emotion classifier using EEG based on feature selection and SVR
2019IsAmongTopNSimilarDocuments
High-Frequency Electroencephalographic Activity in Left Temporal Area Is Associated with Pleasant Emotion Induced by Video Clips
2015IsAmongTopNSimilarDocuments
Video-Induced Tourism in Central Portugal: Production and Impact of Promotional Videos
2022IsAmongTopNSimilarDocuments
Investigating the Influence of Personal Memories on Video-Induced Emotions
2020IsAmongTopNSimilarDocuments
Self-Supervised Learning of Video-Induced Visual Invariances
2020IsAmongTopNSimilarDocuments
caffe-tensorflow software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green