Downloads provided by UsageCounts
handle: 2117/336107
The recognition of emotions in speech is one of the most challenging topics in data science. In this work, we define a pipeline for the study of multimodal speech recognition, using a wide set of features from audio samples and text transcripts. This work aims to study the interaction and contribution of multimodal features and for this purpose, three types of features have been selected. We extract a set of handcrafted features related to speech prosody, along with classical mel spectrogram acoustic features and TF-IDF for text. Combining these three types of data we evaluate the contribution that they represent to each other. This Thesis also provides a comparative study between the classical machine learning models performance over neural architectures in terms of performance and learning potential from speech. Finally, it presents an application that provides emotion classification and feedback retrieval for misclassified samples.
Emotion, Artificial intelligence, Speech Emotion Recognition, Anàlisi Emocional Veu, Intel·ligència artificial, Emocions, classificació emocions, Reconeixement emocions, Recognition, Deep Learning, :Informàtica [Àrees temàtiques de la UPC], Machine learning, Multimodal, Aprenentatge automàtic, Intel·ligència Artificial, Veu, Àrees temàtiques de la UPC::Informàtica, Emotion Classification, Emotional Speech Analysis
Emotion, Artificial intelligence, Speech Emotion Recognition, Anàlisi Emocional Veu, Intel·ligència artificial, Emocions, classificació emocions, Reconeixement emocions, Recognition, Deep Learning, :Informàtica [Àrees temàtiques de la UPC], Machine learning, Multimodal, Aprenentatge automàtic, Intel·ligència Artificial, Veu, Àrees temàtiques de la UPC::Informàtica, Emotion Classification, Emotional Speech Analysis
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 48 | |
| downloads | 91 |

Views provided by UsageCounts
Downloads provided by UsageCounts