
This upload contains a TTS model which was trained on the LJ Speech dataset using these transcriptions but with explicit phoneme duration markers removed. The model is trained using tacotron-cli. The model achieves the following values on the validation set: MOS naturalness: 3.49 ± 0.28 (GT: 4.17 ± 0.23) MOS intelligibility: 4.44 ± 0.21 (GT: 4.63 ± 0.19) mean mel-cepstral distance: 30.96 mean penalty: 0.1341 Files: 101000.pt checkpoint after 500 epochs with a batch size of 64 1-setup-env.sh script to install all required tools 2-create-dataset.sh script to create the base dataset using public resources 3-create-train-val-set.sh script to create the training set and validation set 4-start-training.sh script to start training using Tacotron 5-convert-english-to-ipa.sh script to prepare English texts for synthesis by transcribing them to IPA 6-synthesize.sh script to synthesize IPA transcribed text example-north-wind.zip contains an example passage which was synthesized using the model The model is able to synthesize the following symbols: vowels: i, u, æ, ɑ, ɔ, ə, ɛ, ɪ, ʊ, ʌ diphthongs: aɪ, aʊ, eɪ, oʊ, ɔɪ r-colored vowels: ɔr, ər, ɛr, ɪr, ʊr, ʌr consonants: b, d, dʒ, f, h, j, k, l, m, n, p, r, s, t, tʃ, v, w, z, ð, ŋ, ɡ, ʃ, θ breaks: SIL0, SIL1, SIL2, SIL3 special characters: . ? ! , : ; - — " ' ( ) [ ] Each vowel, diphthong and r-colored vowel can have a leading stress symbol ˈˌ attached, e.g., ˈoʊ. Example: The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak. ð|ə|SIL0|n|ˈɔr|θ|SIL0|w|ˈɪ|n|d|SIL0|ə|n|d|SIL0|ð|ə|SIL0|s|ˈʌ|n|SIL0|w|ˈʌr|SIL0|d|ɪ|s|p|j|ˈu|t|ɪ|ŋ|SIL0|w|ˈɪ|tʃ|SIL0|w|ˈɑ|z|SIL0|ð|ə|SIL0|s|t|r|ˈɔ|ŋ|ər|,|SIL1|w|ˈɛ|n|SIL0|ə|SIL0|t|r|ˈæ|v|ə|l|ər|SIL0|k|ˈeɪ|m|SIL0|ə|l|ˈɔ|ŋ|SIL0|r|ˈæ|p|t|SIL0|ɪ|n|SIL0|ə|SIL0|w|ˈɔr|m|SIL0|k|l|ˈoʊ|k|.|SIL2
The authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden. The authors are grateful to the Center for Information Services and High Performance Computing [Zentrum fur Informationsdienste und Hochleistungsrechnen (ZIH)] at TU Dresden for providing its facilities for high throughput calculations.
LJ Speech, IPA, Phonemes, Tacotron, TTS
LJ Speech, IPA, Phonemes, Tacotron, TTS
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
