A Machine-Learning Algorithm for the Automated Perceptual Evaluation of Dysphonia Severity

descriptionPublicationkeyboard_double_arrow_right Article 01 Nov 2025 English Publisher:Elsevier BVJournal:Journal of Voice, volume 39, pages 1,440-1,445 (issn: 0892-1997,

Copyright policy )

Authors: Benjamin van der Woerd; Zhuohao Chen; Nikolaos Flemotomos; Maria Oljaca; Lauren Timmons Sund; Shrikanth Narayanan; Michael M. Johns;

doi: 10.1016/j.jvoice.2023.06.006

pmid: 37429808

A Machine-Learning Algorithm for the Automated Perceptual Evaluation of Dysphonia Severity

- Summary
- Subjects
- Metrics

Abstract

Auditory-perceptual assessments are the gold standard for assessing voice quality. This project aims to develop a machine-learning model for measuring perceptual dysphonia severity of audio samples consistent with assessments by expert raters.The Perceptual Voice Qualities Database samples were used, including sustained vowel and Consensus Auditory-Perceptual Evaluation of Voice sentences, which were previously expertly rated on a 0-100 scale. The OpenSMILE (audEERING GmbH, Gilching, Germany) toolkit was used to extract acoustic (Mel-Frequency Cepstral Coefficient-based, n = 1428) and prosodic (n = 152) features, pitch onsets, and recording duration. We utilized a support vector machine and these features (n = 1582) for automated assessment of dysphonia severity. Recordings were separated into vowels (V) and sentences (S) and features were extracted separately from each. Final voice quality predictions were made by combining the features extracted from the individual components with the whole audio (WA) sample (three file sets: S, V, WA).This algorithm has a high correlation (r = 0.847) with estimates of expert raters. The root mean square error was 13.36. Increasing signal complexity resulted in better estimation of dysphonia, whereby combining the features outperformed WA, S, and V sets individually.A novel machine-learning algorithm was able to perform perceptual estimates of dysphonia severity using standardized audio samples on a 100-point scale. This was highly correlated to expert raters. This suggests that ML algorithms could offer an objective method for evaluating voice samples for dysphonia severity.

Related Organizations

University of California System
United States
McMaster University
Canada

Keywords

Male, Adult, Observer Variation, Support Vector Machine, Databases, Factual, Voice Quality, Reproducibility of Results, Signal Processing, Computer-Assisted, Acoustics, Dysphonia, Severity of Illness Index, Speech Acoustics, Machine Learning, Judgment, Speech Production Measurement, Predictive Value of Tests, Speech Perception, Humans, Female, Algorithms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

4

Top 10%

Average

Top 10%

Related to Research communities

UArctic

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now