NSP-BERT: A Prompt-based Few-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2021Embargo end date: 01 Jan 2021Publisher:arXivJournal:CoRR, volume abs/2109.03564

Authors: Yi Sun; Yu Zheng 0033; Chao Hao; Hangping Qiu;

doi: 10.48550/arxiv.2109.03564

arXiv: 2109.03564

NSP-BERT: A Prompt-based Few-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction

- Summary
- Subjects
- Related research
  (4)
- Metrics

Abstract

Using prompts to utilize language models to perform various downstream tasks, also known as prompt-based learning or prompt-learning, has lately gained significant success in comparison to the pre-train and fine-tune paradigm. Nonetheless, virtually all prompt-based methods are token-level, meaning they all utilize GPT's left-to-right language model or BERT's masked language model to perform cloze-style tasks. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP). Unlike token-level techniques, our sentence-level prompt-based method NSP-BERT does not need to fix the length of the prompt or the position to be predicted, allowing it to handle tasks such as entity linking with ease. Based on the characteristics of NSP-BERT, we offer several quick building templates for various downstream tasks. We suggest a two-stage prompt method for word sense disambiguation tasks in particular. Our strategies for mapping the labels significantly enhance the model's performance on sentence pair tasks. On the FewCLUE benchmark, our NSP-BERT outperforms other zero-shot methods on most of these tasks and comes close to the few-shot methods.

Published at COLING2022, long paper

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL)

4 Research products, page 1 of 1

UER-py software on GitHub
IsRelatedTo
FewCLUE software on GitHub
IsRelatedTo
bert software on GitHub
IsRelatedTo
Chinese-BERT-wwm software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average