
Text-to-image generation using Generative Adversarial Networks (GANs) has advanced significantly in recent years. This enables image synthesis from textual descriptions. However, mode collapse remains a critical challenge that limits output diversity. This systematic review analyzes strategies to mitigate mode collapse in text-to-image GANs. It examins architectural designs, training methodologies, latent-space techniques, and evaluation metrics. The review covers 45 studies published between 2015 and 2025, categorized into: architectural innovations (18 papers), training-based strategies (12 papers), latent-space and loss function methods (10 papers), and evaluation-centric approaches (5 papers). Findings show that attention-based models, multi-scale architectures, and semantic-spatial models enhance semantic alignment and diversity, with specific limitations. Training-based approaches, including curriculum learning, adaptive training, gradient penalties, and progressive growing of GANs, help stabilize training and mitigate collapse. Latent-space techniques, such as mode-seeking losses, contrastive losses, and noise manipulation, promote output diversity. However, evaluation metrics like Fréchet Inception Distance (FID), Inception Score (IS), Learned Perceptual Image Patch Similarity (LPIPS), and Multi-Scale Structural Similarity Index (MS-SSIM) show limitations in capturing semantic diversity. Progress in mitigating mode collapse depends on combined architectural design, training stability, and loss-function engineering. Future priorities include developing unified benchmarks for evaluating semantic diversity, exploring hybrid architectures, and designing adaptive training protocols to enable more robust text-to-image models generating diverse, semantically coherent outputs.
Text-To-Image GANS, Mode Collapse, Output Diversity, Architectural Design, Training Methodologies, Evaluation Techniques, Attention Mechanisms, Latent-Space Techniques
Text-To-Image GANS, Mode Collapse, Output Diversity, Architectural Design, Training Methodologies, Evaluation Techniques, Attention Mechanisms, Latent-Space Techniques
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
