Scaling Vision Transformers

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jun 2022Embargo end date: 01 Jan 2021Publisher:IEEEJournal:2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Authors: Xiaohua Zhai; Alexander Kolesnikov 0003; Neil Houlsby; Lucas Beyer;

doi: 10.1109/cvpr52688.2022.01179 , 10.48550/arxiv.2106.04560

arXiv: 2106.04560

Scaling Vision Transformers

- Summary
- Subjects
- Metrics

Abstract

Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of-the-art results on many computer vision benchmarks. Scale is a primary ingredient in attaining excellent results, therefore, understanding a model's scaling properties is a key to designing future generations effectively. While the laws for scaling Transformer language models have been studied, it is unknown how Vision Transformers scale. To address this, we scale ViT models and data, both up and down, and characterize the relationships between error rate, data, and compute. Along the way, we refine the architecture and training of ViT, reducing memory consumption and increasing accuracy of the resulting models. As a result, we successfully train a ViT model with two billion parameters, which attains a new state-of-the-art on ImageNet of 90.45% top-1 accuracy. The model also performs well for few-shot transfer, for example, reaching 84.86% top-1 accuracy on ImageNet with only 10 examples per class.

Xiaohua, Alex, and Lucas contributed equally; CVPR 2022

Related Organizations

Google (United States)
United States
Google (Switzerland)
Switzerland

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	338
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 0.01%

Found an issue? Give us feedback

338

Top 0.1%

Top 1%

Top 0.01%

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering