<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective

Name: SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective
Keywords: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, spectral perspective, vision-language foundation models, image generation, 3D; Algorithms; Algorithms; etc.; Generative models for image; video; Vision + language and/or other modalities;

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object , Other literature type 03 Jan 2024Embargo end date: 01 Jan 2023 Italy Publisher:IEEEJournal:2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)Funded by:EC | ELIAS

Authors: Xu, Zipeng; Xing, Songlong; Sangineto, Enver; Sebe, Nicu;

doi: 10.1109/wacv57701.2024.00504 , 10.48550/arxiv.2303.09270

arXiv: http://arxiv.org/abs/2303.09270

handle: 11572/399937 , 11380/1343706

SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Owing to the power of vision-language foundation models, e.g., CLIP, the area of image synthesis has seen recent important advances. Particularly, for style transfer, CLIP enables transferring more general and abstract styles without collecting the style images in advance, as the style can be efficiently described with natural language, and the result is optimized by minimizing the CLIP similarity between the text description and the stylized image. However, directly using CLIP to guide style transfer leads to undesirable artifacts (mainly written words and unrelated visual entities) spread over the image. In this paper, we propose SpectralCLIP, which is based on a spectral representation of the CLIP embedding sequence, where most of the common artifacts occupy specific frequencies. By masking the band including these frequencies, we can condition the generation process to adhere to the target style properties (e.g., color, texture, paint stroke, etc.) while excluding the generation of larger-scale structures corresponding to the artifacts. Experimental results show that SpectralCLIP prevents the generation of artifacts effectively in quantitative and qualitative terms, without impairing the stylisation quality. We also apply SpectralCLIP to text-conditioned image generation and show that it prevents written words in the generated images. Our code is available at https://github.com/zipengxuc/SpectralCLIP.

WACV 2024

Country

Italy

Related Organizations

University of Modena and Reggio Emilia
Italy
University of Trento
Italy

Keywords

FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, spectral perspective, vision-language foundation models, image generation, 3D; Algorithms; Algorithms; etc.; Generative models for image; video; Vision + language and/or other modalities;

1 Research products, page 1 of 1

SpectralCLIP software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average