BASPRO: A Balanced Script Producer for Speech Corpus Collection Based on the Genetic Algorithm

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2023Embargo end date: 01 Jan 2023 English Publisher:EmeraldJournal:APSIPA Transactions on Signal and Information Processing, volume 12 (eissn: 2048-7703,

Copyright policy )

Authors: Yu-Wen Chen; Hsin-Min Wang; Yu Tsao;

doi: 10.1561/116.00000155 , 10.48550/arxiv.2301.04120

arXiv: 2301.04120

BASPRO: A Balanced Script Producer for Speech Corpus Collection Based on the Genetic Algorithm

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

The performance of speech-processing models is heavily influenced by the speech corpus that is used for training and evaluation. In this study, we propose BAlanced Script PROducer (BASPRO) system, which can automatically construct a phonetically balanced and rich set of Chinese sentences for collecting Mandarin Chinese speech data. First, we used pretrained natural language processing systems to extract ten-character candidate sentences from a large corpus of Chinese news texts. Then, we applied a genetic algorithm-based method to select 20 phonetically balanced sentence sets, each containing 20 sentences, from the candidate sentences. Using BASPRO, we obtained a recording script called TMNews, which contains 400 ten-character sentences. TMNews covers 84% of the syllables used in the real world. Moreover, the syllable distribution has 0.96 cosine similarity to the real-world syllable distribution. We converted the script into a speech corpus using two text-to-speech systems. Using the designed speech corpus, we tested the performances of speech enhancement (SE) and automatic speech recognition (ASR), which are one of the most important regression- and classification-based speech processing tasks, respectively. The experimental results show that the SE and ASR models trained on the designed speech corpus outperform their counterparts trained on a randomly composed speech corpus.

accepted by APSIPA Transactions on Signal and Information Processing

Related Organizations

Columbia University
United States
King’s University
United States
Research Center for Information Technology Innovation, Academia Sinica
Taiwan
Academia Sinica
Taiwan
Columbia University
United States

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, QA75.5-76.95, Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Audio and Speech Processing (eess.AS), Electronic computers. Computer science, FOS: Electrical engineering, electronic engineering, information engineering, Neural and Evolutionary Computing (cs.NE), Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing

1 Research products, page 1 of 1

BASPRO software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average