huggingface/transformers: T5 Model, BART summarization example and reduced memory, translation pipeline

integration_instructionsResearch softwarekeyboard_double_arrow_right Software 30 Mar 2020Publisher:Zenodo

Authors: Wolf, Thomas; Debut, Lysandre; Chaumond, Julien; SANH, Victor; Platen, Patrick Von; Augustin, Aymeric; Louf, Rémi; +23 Authors

doi: 10.5281/zenodo.3733180

huggingface/transformers: T5 Model, BART summarization example and reduced memory, translation pipeline

- Summary
- Related research
  (1)
- Metrics

Abstract

T5 Model (@patrickvonplaten, @thomwolf ) T5 is a powerful encoder-decoder model that formats every NLP problem into a text-to-text format. It achieves state of the art results on a variety of NLP tasks (Summarization, Question-Answering, ...). Five sets of pre-trained weights (pre-trained on a multi-task mixture of unsupervised and supervised tasks) are released. In ascending order from 60 million parameters to 11 billion parameters: t5-small, t5-base, t5-large, t5-3b, t5-11b T5 can now be used with the translation and summarization pipeline. Related: paper official code model available in Hugging Face's community models docs Big thanks to the original authors, especially @craffel who helped answer our questions, reviewed PRs and tested T5 extensively. New BART checkpoint: bart-large-xsum (@sshleifer) These weights are from BART finetuned on the XSum abstractive summarization challenge, which encourages shorter (more abstractive) summaries. It achieves state of the art. BART summarization example with pytorch-lightning (@acarrera94) New example: BART for summarization, using Pytorch-lightning. Trains on CNN/DM and evaluates. Translation pipeline (@patrickvonplaten) A new pipeline is available, leveraging the T5 model. The T5 model was added to the summarization pipeline as well. Memory improvements with BART (@sshleifer) In an effort to have the same memory footprint and same computing power necessary to run inference on BART, several improvements have been made on the model: Remove the LM head and use the embedding matrix instead (~200MB) Call encoder before expanding input_ids (~1GB) SelfAttention only returns weights if config.output_attentions (~500MB) Two separate, smaller decoder attention masks (~500MB) drop columns that are exclusively pad_token_id from input_ids in evaluate_cnn example. New model: XLMForTokenClassification (@sakares) A new head was added to XLM: XLMForTokenClassification.

Related Organizations

Intel (United States)
United States
University of California System
United States
Stanford University
United States

1 Research products, page 1 of 1

Transformers: State-of-the-Art Natural Language Processing
2020IsVersionOf

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average