Downloads provided by UsageCounts
arXiv: 2106.09539
Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.
FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Real-world audio, Speech emotion recognition, speech analysis, 113 Computer and information sciences, Daylong audio, daylong audio, LENA recorder, ta3123, Computer Science - Sound, Machine Learning (cs.LG), Lena recorder, speech emotion recognition, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Speech analysis, ta515, real-world audio, Electrical Engineering and Systems Science - Audio and Speech Processing
FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Real-world audio, Speech emotion recognition, speech analysis, 113 Computer and information sciences, Daylong audio, daylong audio, LENA recorder, ta3123, Computer Science - Sound, Machine Learning (cs.LG), Lena recorder, speech emotion recognition, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Speech analysis, ta515, real-world audio, Electrical Engineering and Systems Science - Audio and Speech Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 5 | |
| downloads | 10 |

Views provided by UsageCounts
Downloads provided by UsageCounts