Name: Assessing the Agreement Competence of Large Language Models
Keywords: Romance languages, Large Language Models (LLMs), Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural, Targeted syntactic evaluation, Agreement

descriptionPublicationkeyboard_double_arrow_right Conference object 01 Jan 2025 English Publisher:Association for Computational Linguistics (ACL)

Authors: Táboas García, Alba; Wanner, Leo;

handle: 2117/440646

Assessing the Agreement Competence of Large Language Models

- Summary
- Subjects
- Metrics

Abstract

While the competence of LLMs to cope with agreement constraints has been widely tested in English, only a very limited number of works deals with morphologically rich(er) languages. In this work, we experiment with 25 mono- and multilingual LLMs, applying them to a collection of more than 5,000 test examples that cover the main agreement phenomena in three Romance languages (Italian, Portuguese, and Spanish) and one Slavic Language (Russian). We identify which of the agreement phenomena are most difficult for which models and challenge some common assumptions of what makes a good model. The test suites into which the test examples are organized are openly available and can be easily adapted to other agreement phenomena and other languages for further research.

Peer Reviewed

Related Organizations

Universitat Polite`cnica de Catalunya
Spain

Keywords

Romance languages, Large Language Models (LLMs), Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural, Targeted syntactic evaluation, Agreement

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average