Reversible Vision Transformers

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jun 2022Embargo end date: 01 Jan 2023Publisher:IEEEJournal:2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Authors: Karttikeya Mangalam; Haoqi Fan 0001; Yanghao Li; Chao-Yuan Wu; Bo Xiong; Christoph Feichtenhofer; Jitendra Malik;

doi: 10.1109/cvpr52688.2022.01056 , 10.48550/arxiv.2302.04869

arXiv: 2302.04869

Reversible Vision Transformers

- Summary
- Subjects
- Metrics

Abstract

We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. By decoupling the GPU memory requirement from the depth of the model, Reversible Vision Transformers enable scaling up architectures with efficient memory usage. We adapt two popular models, namely Vision Transformer and Multiscale Vision Transformers, to reversible variants and benchmark extensively across both model sizes and tasks of image classification, object detection and video classification. Reversible Vision Transformers achieve a reduced memory footprint of up to 15.5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes. Finally, we find that the additional computational burden of recomputing activations is more than overcome for deeper models, where throughput can increase up to 2.3x over their non-reversible counterparts. Full code and trained models are available at https://github.com/facebookresearch/slowfast. A simpler, easy to understand and modify version is also available at https://github.com/karttikeya/minREV

Oral at CVPR 2022, updated version

Related Organizations

University of California, Berkeley
United States

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	22
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

22

Top 10%

Green

Fields of Science

medical and health sciences

clinical medicine

Fields of Science

medical and health sciences

clinical medicine