AUBER: Automated BERT regularization

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Preprint 28 Jun 2021Embargo end date: 01 Jan 2020 English Publisher:Public Library of Science (PLoS)Journal:PLOS ONE, volume 16, page e0253241 (eissn: 1932-6203,

Copyright policy )

Authors: Hyun Dong Lee; Seongmin Lee; U. Kang;

doi: 10.1371/journal.pone.0253241 , 10.48550/arxiv.2009.14409

pmid: 34181664

pmc: PMC8238198

arXiv: 2009.14409

AUBER: Automated BERT regularization

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

How can we effectively regularize BERT? Although BERT proves its effectiveness in various NLP tasks, it often overfits when there are only a small number of training instances. A promising direction to regularize BERT is based on pruning its attention heads with a proxy score for head importance. However, these methods are usually suboptimal since they resort to arbitrarily determined numbers of attention heads to be pruned and do not directly aim for the performance enhancement. In order to overcome such a limitation, we propose AUBER, an automated BERT regularization method, that leverages reinforcement learning to automatically prune the proper attention heads from BERT. We also minimize the model complexity and the action search space by proposing a low-dimensional state representation and dually-greedy approach for training. Experimental results show that AUBER outperforms existing pruning methods by achieving up to 9.58% better performance. In addition, the ablation study demonstrates the effectiveness of design choices for AUBER.

Related Organizations

Columbia University
Seoul National University
Korea (Republic of)
Columbia University, Columbia University
Columbia University
United States
Columbia University

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Artificial Intelligence, Science, Q, R, Models, Theoretical, Artificial Intelligence (cs.AI), Medicine, Research Article, Natural Language Processing

2 Research products, page 1 of 1

AUBER software on GitHub
IsRelatedTo
transformers software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average