Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion

Name: Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
Keywords: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Kamble, Anand; Tathe, Aniket; Kumbharkar, Suyash; Bhandare, Atharva; Mitra, Anirban C.

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2023

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2023

License: CC BY

Data sources: Datacite

Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2023Embargo end date: 01 Jan 2023Publisher:arXiv

Authors: Kamble, Anand; Tathe, Aniket; Kumbharkar, Suyash; Bhandare, Atharva; Mitra, Anirban C.;

doi: 10.48550/arxiv.2311.14836

arXiv: 2311.14836

Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion

- Summary
- Subjects
- Related research
  (5)
- Metrics

Abstract

This paper proposes two innovative methodologies to construct customized Common Voice datasets for low-resource languages like Hindi. The first methodology leverages Bark, a transformer-based text-to-audio model developed by Suno, and incorporates Meta's enCodec and a pre-trained HuBert model to enhance Bark's performance. The second methodology employs Retrieval-Based Voice Conversion (RVC) and uses the Ozen toolkit for data preparation. Both methodologies contribute to the advancement of ASR technology and offer valuable insights into addressing the challenges of constructing customized Common Voice datasets for under-resourced languages. Furthermore, they provide a pathway to achieving high-quality, personalized voice generation for a range of applications.

Related Organizations

University of Florida
United States
MES College of Engineering
India
Technische Hochschule Ingolstadt
Germany
Technische Hochschule Ingolstadt
Germany
Florida State University
United States

View all View all

Keywords

FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

5 Research products, page 1 of 1

Retrieval-based-Voice-Conversion-WebUI software on GitHub
IsRelatedTo
bark software on GitHub
IsRelatedTo
bark-with-voice-clone software on GitHub
IsRelatedTo
spleeter software on GitHub
IsRelatedTo
ozen-toolkit software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion

Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion

5 Research products, page 1 of 1

Retrieval-based-Voice-Conversion-WebUI software on GitHub

bark software on GitHub

bark-with-voice-clone software on GitHub

spleeter software on GitHub

ozen-toolkit software on GitHub