Iconicity in large language models

Name: Iconicity in large language models
Keywords: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence, Computation and Language, Computation and Language (cs.CL)

Anna Marklová; Jiří Milička; Leonid Ryvkin; L’udmila Lacková Bennet; Libuše Kormaníková

Found an issue? Give us feedback

Digital Scholarship ...arrow_drop_down

Digital Scholarship in the Humanities

Article . 2025 . Peer-reviewed

License: CC BY

Data sources: Crossref

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2025

License: CC BY

Data sources: Datacite

DBLP

Article

Data sources: DBLP

Iconicity in large language models

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 18 Sep 2025Embargo end date: 01 Jan 2025 English Publisher:Oxford University Press (OUP)Journal:Digital Scholarship in the Humanities, volume 40, pages 1,203-1,224 (issn: 2055-7671, eissn: 2055-768X,

Copyright policy )

Authors: Anna Marklová; Jiří Milička; Leonid Ryvkin; L’udmila Lacková Bennet; Libuše Kormaníková;

doi: 10.1093/llc/fqaf095 , 10.48550/arxiv.2501.05643

arXiv: 2501.05643

Iconicity in large language models

- Summary
- Subjects
- Metrics

Abstract

Abstract Lexical iconicity, a direct relation between a word’s meaning and its form, is an important aspect of every natural language, most commonly manifesting through sound-meaning associations. Since Large language models’ (LLMs’) access to both meaning and sound of text is only mediated (meaning through textual context, sound through written representation, further complicated by tokenization), we might expect that the encoding of iconicity in LLMs would be either insufficient or significantly different from human processing. This study addresses this hypothesis by having GPT-4 generate highly iconic pseudowords in artificial languages. To verify that these words actually carry iconicity, we had their meanings guessed by Czech and German participants (n = 672) and subsequently by LLM-based participants (generated by GPT-4 and Claude 3.5 Sonnet). The results revealed that humans can guess the meanings of pseudowords in the generated iconic language more accurately than words in distant natural languages and that LLM-based participants are even more successful than humans in this task. This core finding is accompanied by several additional analyses concerning the universality of the generated language and the cues that both human and LLM-based participants utilize.

Related Organizations

Palacký University, Olomouc
Czech Republic
Camille Jordan Institute
France
Charles University
Czech Republic

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence, Computation and Language, Computation and Language (cs.CL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

hybrid

Related to Research communities

Aurora Universities Network

Digital Humanities and Cultural Heritage

Knowmad Institut