descriptionPublicationkeyboard_double_arrow_right Article , Preprint 11 Aug 2024Embargo end date: 01 Jan 2024Publisher:Open Exploration PublishingJournal:Exploration of Digital Health Technologies

Authors: Yuyang Yan; Wafaa Aljbawi; Sami O. Simons; Visara Urovi;

doi: 10.37349/edht.2024.00022 , 10.48550/arxiv.2402.07619

arXiv: 2402.07619

Developing a multi-variate prediction model for COVID-19 from crowd-sourced respiratory voice data

- Summary
- Subjects
- Metrics

Abstract

Aim: COVID-19 has affected more than 223 countries worldwide and in the post-COVID era, there is a pressing need for non-invasive, low-cost, and highly scalable solutions to detect COVID-19. This study focuses on the analysis of voice features and machine learning models in the automatic detection of COVID-19. Methods: We develop a deep learning model to identify COVID-19 from voice recording data. The novelty of this work is in the development of deep learning models for COVID-19 identification from only voice recordings. We use the Cambridge COVID-19 Sound database which contains 893 speech samples, crowd-sourced from 4,352 participants via a COVID-19 Sounds app. Voice features including Mel-spectrograms and Mel-frequency cepstral coefficients (MFCC) and convolutional neural network (CNN) Encoder features are extracted. Based on the voice data, we develop deep learning classification models to detect COVID-19 cases. These models include long short-term memory (LSTM), CNN and Hidden-Unit BERT (HuBERT). Results: We compare their predictive power to baseline machine learning models. HuBERT achieves the highest accuracy of 86% and the highest AUC of 0.93. Conclusions: The results achieved with the proposed models suggest promising results in COVID-19 diagnosis from voice recordings when compared to the results obtained from the state-of-the-art.

Related Organizations

Maastricht University
Netherlands
Maastricht University Medical Centre
Netherlands

Keywords

FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Artificial Intelligence, mfcc, R, deep learning, Information technology, T58.5-58.64, Computer Science - Sound, covid-19 diagnosis, voice analysis, machine learning, Artificial Intelligence (cs.AI), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Medicine, mel-spectrogram, Electrical Engineering and Systems Science - Audio and Speech Processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average