DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

Name: DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computation and Language (cs.CL), Machine Learning (cs.LG)

Peng Tang; Pengkai Zhu; Tian Li; Srikar Appalaraju; Vijay Mahadevan; R. Manmatha

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2023

Data sources: arXiv.org e-Print Archive

https://doi.org/10.18653/v1/20...

Article . 2024 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2023

License: CC BY

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Conference object

Data sources: DBLP

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2024Embargo end date: 01 Jan 2023Publisher:Association for Computational Linguistics (ACL)Journal:Findings of the Association for Computational Linguistics: NAACL 2024

Authors: Peng Tang; Pengkai Zhu; Tian Li; Srikar Appalaraju; Vijay Mahadevan; R. Manmatha;

doi: 10.18653/v1/2024.findings-naacl.9 , 10.48550/arxiv.2311.08623

arXiv: 2311.08623

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes up most of the latency because of the auto-regressive decoding. To accelerate the inference, we propose an approach of performing Dynamic Early Exit on Decoder (DEED). We build a multi-exit encoder-decoder transformer model which is trained with deep supervision so that each of its decoder layers is capable of generating plausible predictions. In addition, we leverage simple yet practical techniques, including shared generation head and adaptation modules, to keep accuracy when exiting at shallow decoder layers. Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step. Considering different number of decoder layers may be used at different decoding steps, we compute deeper-layer decoder features of previous decoding steps just-in-time, which ensures the features from different decoding steps are semantically aligned. We evaluate our approach with two state-of-the-art encoder-decoder transformer models on various VL tasks. We show our approach can reduce overall inference latency by 30%-60% with comparable or even higher accuracy compared to baselines.

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computation and Language (cs.CL), Machine Learning (cs.LG)

1 Research products, page 1 of 1

ofa software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

1 Research products, page 1 of 1

ofa software on GitHub