Towards Equipping Transformer with the Ability of Systematic Compositionality

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 24 Mar 2024Embargo end date: 01 Jan 2023Publisher:Association for the Advancement of Artificial Intelligence (AAAI)Journal:Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 18,289-18,297 (issn: 2159-5399, eissn: 2374-3468,

Copyright policy )

Authors: Huang, Chen; Qin, Peixin; Lei, Wenqiang; Lv, Jiancheng;

doi: 10.1609/aaai.v38i16.29788 , 10.48550/arxiv.2312.07280

arXiv: 2312.07280

Towards Equipping Transformer with the Ability of Systematic Compositionality

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

One of the key factors in language productivity and human cognition is the ability of Systematic Compositionality, which refers to understanding composed, unseen examples of seen primitives. However, recent evidence reveals that the Transformers have difficulty in generalizing the composed context based on the seen primitives. To this end, we take the first step to propose a compositionality-aware Transformer called CAT and two novel pre-training tasks to facilitate the systematic compositionality. We tentatively provide a successful implementation of a multi-layer CAT on the basis of the especially popular BERT. The experimental results demonstrate that CAT outperforms baselines on compositionality-aware tasks with minimal impact on effectiveness on standardized language understanding tasks.

Related Organizations

Sichuan University
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)

2 Research products, page 1 of 1

BERT-pytorch software on GitHub
IsRelatedTo
rectified-linear-attention software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average