LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation

Name: LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation
Keywords: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Guangcong Zheng; Xianpan Zhou; Xuewei Li 0003; Zhongang Qi; Ying Shan; Xi Li 0001

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2023

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1109/cvpr52...

Article . 2023 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2023

License: CC BY

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Conference object

Data sources: DBLP

LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jun 2023Embargo end date: 01 Jan 2023Publisher:IEEEJournal:2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Authors: Guangcong Zheng; Xianpan Zhou; Xuewei Li 0003; Zhongang Qi; Ying Shan; Xi Li 0001;

doi: 10.1109/cvpr52729.2023.02154 , 10.48550/arxiv.2303.17189

arXiv: 2303.17189

LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation

- Summary
- Subjects
- Related research
  (6)
- Metrics

Abstract

Recently, diffusion models have achieved great success in image synthesis. However, when it comes to the layout-to-image generation where an image often has a complex scene of multiple objects, how to make strong control over both the global layout map and each detailed object remains a challenging task. In this paper, we propose a diffusion model named LayoutDiffusion that can obtain higher generation quality and greater controllability than the previous works. To overcome the difficult multimodal fusion of image and layout, we propose to construct a structural image patch with region information and transform the patched image into a special layout to fuse with the normal layout in a unified form. Moreover, Layout Fusion Module (LFM) and Object-aware Cross Attention (OaCA) are proposed to model the relationship among multiple objects and designed to be object-aware and position-sensitive, allowing for precisely controlling the spatial related information. Extensive experiments show that our LayoutDiffusion outperforms the previous SOTA methods on FID, CAS by relatively 46.35%, 26.70% on COCO-stuff and 44.29%, 41.82% on VG. Code is available at https://github.com/ZGCTroy/LayoutDiffusion.

Accepted by CVPR2023

Related Organizations

Zhejiang Ocean University
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

6 Research products, page 1 of 1

LAMA software on GitHub
IsRelatedTo
pytorch_image_classification software on GitHub
IsRelatedTo
improved-gan software on GitHub
IsRelatedTo
darknet software on GitHub
IsRelatedTo
TTUR software on GitHub
IsRelatedTo
PerceptualSimilarity software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	38
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%

Found an issue? Give us feedback

38

Top 10%

Top 1%

Green

LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation

LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation

6 Research products, page 1 of 1

LAMA software on GitHub

pytorch_image_classification software on GitHub

improved-gan software on GitHub

darknet software on GitHub

TTUR software on GitHub

PerceptualSimilarity software on GitHub