
arXiv: 2301.04120
The performance of speech-processing models is heavily influenced by the speech corpus that is used for training and evaluation. In this study, we propose BAlanced Script PROducer (BASPRO) system, which can automatically construct a phonetically balanced and rich set of Chinese sentences for collecting Mandarin Chinese speech data. First, we used pretrained natural language processing systems to extract ten-character candidate sentences from a large corpus of Chinese news texts. Then, we applied a genetic algorithm-based method to select 20 phonetically balanced sentence sets, each containing 20 sentences, from the candidate sentences. Using BASPRO, we obtained a recording script called TMNews, which contains 400 ten-character sentences. TMNews covers 84% of the syllables used in the real world. Moreover, the syllable distribution has 0.96 cosine similarity to the real-world syllable distribution. We converted the script into a speech corpus using two text-to-speech systems. Using the designed speech corpus, we tested the performances of speech enhancement (SE) and automatic speech recognition (ASR), which are one of the most important regression- and classification-based speech processing tasks, respectively. The experimental results show that the SE and ASR models trained on the designed speech corpus outperform their counterparts trained on a randomly composed speech corpus.
accepted by APSIPA Transactions on Signal and Information Processing
FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, QA75.5-76.95, Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Audio and Speech Processing (eess.AS), Electronic computers. Computer science, FOS: Electrical engineering, electronic engineering, information engineering, Neural and Evolutionary Computing (cs.NE), Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing
FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, QA75.5-76.95, Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Audio and Speech Processing (eess.AS), Electronic computers. Computer science, FOS: Electrical engineering, electronic engineering, information engineering, Neural and Evolutionary Computing (cs.NE), Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
