Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection

Andreas Triantafyllopoulos; Anastasia Semertzidou; Meishu Song; Florian B. Pokorny; Björn W. Schuller

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset . 2022

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2022

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2022

License: CC BY

Data sources: ZENODO

Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection

Research datakeyboard_double_arrow_right Dataset 01 Sep 2022 English Publisher:ZenodoFunded by:EC | sustAGE

Authors: Andreas Triantafyllopoulos; Anastasia Semertzidou; Meishu Song; Florian B. Pokorny; Björn W. Schuller;

doi: 10.5281/zenodo.6962929 , 10.5281/zenodo.6962930

Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection

- Summary
- Subjects
- Metrics

Abstract

The COVYT dataset contains speech samples from individuals who self-reported their COVID-19 infection on public social media platforms (YouTube, Xiaohongshu). These videos, as well as accompanying videos of the same people prior to infection, were mined in an attempt to gather publicly-available data for COVID-19 research. This release includes the links to the original videos along with the accompanying manual segmentation and diarisation that identifies the utterances of the target individuals. We are additionally releasing features derived from the segmented utterances. Finally, the dataset includes partitioning information according to 4 different cross-validation schemes. See the arxiv pre-print for more details: https://arxiv.org/abs/2206.11045

Related Organizations

Universität Augsburg
Germany

Keywords

machine learning, speech dataset, computer audition, COVID-19, speech pathology, disease detection

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average