Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Report
Data sources: ZENODO
addClaim

Robustness of Zero-Shot Cross-Lingual Voice Cloning in Flow-Matching TTS Under Noisy and Adversarial Conditions

Authors: SOVEREIGN Research Kernel;

Robustness of Zero-Shot Cross-Lingual Voice Cloning in Flow-Matching TTS Under Noisy and Adversarial Conditions

Abstract

In this paper, we present X-Voice, a 0.4B multilingual zero-shot voice cloning model that clones arbitrary voices and enables everyone to speak 30 languages. X-Voice is trained on a 420K-hour multilingual corpus using the International Phonetic Alphabet (IPA) as a unified representation. To eliminate the reliance on prompt text without complex preprocessing like forced alignment, we design a two-stage training paradigm. In Stage 1, we establish X-Voice\$\_\text\s1\\\$ through standard conditional flow-matching training and use it to synthesize 10K hours of speaker-consistent segments as audio prResearch goal: How does the robustness of zero-shot cross-lingual voice cloning in flow-matching TTS models vary when evaluated on noisy or adversarial input audio compared to diffusion-based and autoregressive models?Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.0/10.

Powered by OpenAIRE graph
Found an issue? Give us feedback