StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Oct 2017Embargo end date: 01 Jan 2016Publisher:IEEEJournal:2017 IEEE International Conference on Computer Vision (ICCV)

Authors: Zhang, Han; Xu, Tao; Li, Hongsheng; Zhang, Shaoting; Wang, Xiaogang; Huang, Xiaolei; Metaxas, Dimitris;

doi: 10.1109/iccv.2017.629 , 10.48550/arxiv.1612.03242

arXiv: 1612.03242

StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks

- Summary
- Subjects
- Metrics

Abstract

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256x256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I results and add compelling details with the refinement process. To improve the diversity of the synthesized images and stabilize the training of the conditional-GAN, we introduce a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions.

ICCV 2017 Oral Presentation

Related Organizations

University of North Carolina at Chapel Hill
United States
Rutgers, The State University of New Jersey
United States
Chinese University of Hong Kong
China (People's Republic of)
Department of Computer Science
Spain
Lehigh University

View all View all

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2K
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.01%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 0.01%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 0.01%

Found an issue? Give us feedback

2K

Top 0.01%

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering