Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

MITIGATING MODE COLLAPSE TO IMPROVE DIVERSITY IN TEXT-TO-IMAGE GAN OUTPUTS: STRATEGIES IN ARCHITECTURAL DESIGN, TRAINING METHODOLOGIES, AND EVALUATION TECHNIQUES

Authors: SUBUHI KASHIF ANSARI, MANAL AL KHAMMASH, ANJALI APPUKUTTAN, ANNE ANOOP, SANDEEP KUMAR MATHARIYA, SHEELA D V, MOHAMMED SALEH AL ANSARI;

MITIGATING MODE COLLAPSE TO IMPROVE DIVERSITY IN TEXT-TO-IMAGE GAN OUTPUTS: STRATEGIES IN ARCHITECTURAL DESIGN, TRAINING METHODOLOGIES, AND EVALUATION TECHNIQUES

Abstract

Text-to-image generation using Generative Adversarial Networks (GANs) has advanced significantly in recent years. This enables image synthesis from textual descriptions. However, mode collapse remains a critical challenge that limits output diversity. This systematic review analyzes strategies to mitigate mode collapse in text-to-image GANs. It examins architectural designs, training methodologies, latent-space techniques, and evaluation metrics. The review covers 45 studies published between 2015 and 2025, categorized into: architectural innovations (18 papers), training-based strategies (12 papers), latent-space and loss function methods (10 papers), and evaluation-centric approaches (5 papers). Findings show that attention-based models, multi-scale architectures, and semantic-spatial models enhance semantic alignment and diversity, with specific limitations. Training-based approaches, including curriculum learning, adaptive training, gradient penalties, and progressive growing of GANs, help stabilize training and mitigate collapse. Latent-space techniques, such as mode-seeking losses, contrastive losses, and noise manipulation, promote output diversity. However, evaluation metrics like Fréchet Inception Distance (FID), Inception Score (IS), Learned Perceptual Image Patch Similarity (LPIPS), and Multi-Scale Structural Similarity Index (MS-SSIM) show limitations in capturing semantic diversity. Progress in mitigating mode collapse depends on combined architectural design, training stability, and loss-function engineering. Future priorities include developing unified benchmarks for evaluating semantic diversity, exploring hybrid architectures, and designing adaptive training protocols to enable more robust text-to-image models generating diverse, semantically coherent outputs.

Keywords

Text-To-Image GANS, Mode Collapse, Output Diversity, Architectural Design, Training Methodologies, Evaluation Techniques, Attention Mechanisms, Latent-Space Techniques

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average