
doi: 10.21227/s2r2-hg86
"This dataset contains 1248 speech audio samples synthetically generated by Text-to-speech systems. The audios are emotionally incongruent between transcription and voice tone. To generate each speech sample, we leverage emotion-rich sentences divided into four distinct emotions: angry, happy, neutral, and sad. For each sentence, we employ three different TTS systems to generate speech in the same four different emotions, thus resulting in three emotionally incongruent speech samples per sentence. Unlike standard emotional speech samples that are used to train and test emotion recognition systems, this dataset provided incongruency between the sentiment present in the tone of the voice and that present in the transcription of the sample."
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
