
pmid: 37429808
Auditory-perceptual assessments are the gold standard for assessing voice quality. This project aims to develop a machine-learning model for measuring perceptual dysphonia severity of audio samples consistent with assessments by expert raters.The Perceptual Voice Qualities Database samples were used, including sustained vowel and Consensus Auditory-Perceptual Evaluation of Voice sentences, which were previously expertly rated on a 0-100 scale. The OpenSMILE (audEERING GmbH, Gilching, Germany) toolkit was used to extract acoustic (Mel-Frequency Cepstral Coefficient-based, n = 1428) and prosodic (n = 152) features, pitch onsets, and recording duration. We utilized a support vector machine and these features (n = 1582) for automated assessment of dysphonia severity. Recordings were separated into vowels (V) and sentences (S) and features were extracted separately from each. Final voice quality predictions were made by combining the features extracted from the individual components with the whole audio (WA) sample (three file sets: S, V, WA).This algorithm has a high correlation (r = 0.847) with estimates of expert raters. The root mean square error was 13.36. Increasing signal complexity resulted in better estimation of dysphonia, whereby combining the features outperformed WA, S, and V sets individually.A novel machine-learning algorithm was able to perform perceptual estimates of dysphonia severity using standardized audio samples on a 100-point scale. This was highly correlated to expert raters. This suggests that ML algorithms could offer an objective method for evaluating voice samples for dysphonia severity.
Male, Adult, Observer Variation, Support Vector Machine, Databases, Factual, Voice Quality, Reproducibility of Results, Signal Processing, Computer-Assisted, Acoustics, Dysphonia, Severity of Illness Index, Speech Acoustics, Machine Learning, Judgment, Speech Production Measurement, Predictive Value of Tests, Speech Perception, Humans, Female, Algorithms
Male, Adult, Observer Variation, Support Vector Machine, Databases, Factual, Voice Quality, Reproducibility of Results, Signal Processing, Computer-Assisted, Acoustics, Dysphonia, Severity of Illness Index, Speech Acoustics, Machine Learning, Judgment, Speech Production Measurement, Predictive Value of Tests, Speech Perception, Humans, Female, Algorithms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
