
Prior studies in the automatic classification of voice quality have mainly studied support vector machine (SVM) classifiers using the acoustic speech signal as input. Recently, one voice quality classification study was published using neck surface accelerometer (NSA) and speech signals as inputs and using SVMs with hand-crafted glottal source features. The present study examines simultaneously recorded NSA and speech signals in the classification of three voice qualities (breathy, modal, and pressed) using convolutional neural networks (CNNs) as classifier. The study has two goals: (1) to investigate which of the two signals (NSA vs. speech) is more useful in the classification task, and (2) to compare whether deep learning -based CNN classifiers with spectrogram and mel-spectrogram features are able to improve the classification accuracy compared to SVM classifiers using hand-crafted glottal source features. The results indicated that the NSA signal showed better classification of the voice qualities compared to the speech signal, and that the CNN classifier outperformed the SVM classifiers with large margins. The best mean classification accuracy was achieved with mel-spectrogram as input to the CNN classifier (93.8% for NSA and 90.6% for speech).
This work was supported by the Academy of Finland (grant number 313390). The computational resources were provided by Aalto ScienceIT.
Peer reviewed
computational paralinguistics, neck surface accelerometer, Melspectrogram, CNNs, Voice quality
computational paralinguistics, neck surface accelerometer, Melspectrogram, CNNs, Voice quality
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
