Transformer models: an introduction and catalog

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2023Embargo end date: 01 Jan 2023Publisher:arXivJournal:CoRR, volume abs/2302.07730

Authors: Amatriain, Xavier; Sankar, Ananth; Bing, Jie; Bodigutla, Praveen Kumar; Hazen, Timothy J.; Kazi, Michaeel;

doi: 10.48550/arxiv.2302.07730

arXiv: 2302.07730

Transformer models: an introduction and catalog

- Summary
- Subjects
- Related research
  (10)
- Metrics

Abstract

In the past few years we have seen the meteoric appearance of dozens of foundation models of the Transformer family, all of which have memorable and sometimes funny, but not self-explanatory, names. The goal of this paper is to offer a somewhat comprehensive but simple catalog and classification of the most popular Transformer models. The paper also includes an introduction to the most important aspects and innovations in Transformer models. Our catalog will include models that are trained using self-supervised learning (e.g., BERT or GPT3) as well as those that are further trained using a human-in-the-loop (e.g. the InstructGPT model used by ChatGPT).

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)

10 Research products, page 1 of 1

Megatron-LM software on GitHub
IsRelatedTo
alexa-teacher-models software on GitHub
IsRelatedTo
gpt-3 software on GitHub
IsRelatedTo
stanford_alpaca software on GitHub
IsRelatedTo
alphafold software on GitHub
IsRelatedTo
pythia software on GitHub
IsRelatedTo
fastmoe software on GitHub
IsRelatedTo
Swin-Transformer software on GitHub
IsRelatedTo
transformers software on GitHub
IsRelatedTo
TA-AT software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

3

Top 10%

Average

Green

Fields of Science (4) View all

natural sciences

Fields of Science

natural sciences

View all