Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models

descriptionPublicationkeyboard_double_arrow_right Article 08 Jan 2025 English Publisher:MDPI AGJournal:Computers, volume 14, page 19 (eissn: 2073-431X,

Copyright policy )

Authors: Wala Elsharif; Mahmood Alzubaidi; James She; Marco Agus;

doi: 10.3390/computers14010019

Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models

- Summary
- Subjects
- Metrics

Abstract

Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field.

Related Organizations

Hamad bin Khalifa University
Qatar
The Hong Kong University of Science and Technology (Guangzhou)
China (People's Republic of)

Keywords

computational linguistics, text-to-image models, diffusion models, prompt engineering, Electronic computers. Computer science, linguistic ambiguity, QA75.5-76.95, natural language processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

3

Top 10%

Average

gold

Related to Research communities

Digital Humanities and Cultural Heritage