Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
addClaim

DnR-nonverbal dataset

Authors: Takuya, Hasumi; Yusuke, Fujita;

DnR-nonverbal dataset

Abstract

Introduction DnR-nonverbal is a dataset for cinematic audio source separation (CASS) based on Divide and Remaster (DnR) dataset. Unlike conventional datasets, our dataset contains non-verbal sounds such as laughter and screaming, just like actual movie audio. Our dataset enables CASS models to allocate non-verbal sounds to the same stem as speech. Examples of clips and separation results are available at https://tky823.github.io/hasumi2025dnr.github.io/ How to Use Download dnr-nonverbal.tar.gz from this page. Extract dnr-nonverbal.tar.gz by tar xvzf dnr-nonverval.tar.gz (optional) Mix directories with the DnR. Our sample IDs are assigned in such a way that they do not duplicate DnR. Dataset Structure The dataset structure is based on DnR, except that our dataset contains non-verbal sounds as a part of the speech stem. dnr-nonverbal ├── tr │ ├── 100009 │ │ ├── annots.csv │ │ ├── background.wav │ │ ├── foreground.wav │ │ ├── mix.wav │ │ ├── music.wav │ │ ├── nonverbal.wav │ │ ├── reading.wav │ │ ├── sfx.wav │ │ └── speech.wav │ ├── 100031 │ ... ├── cv └── tt reading.wav: Reading style speech extracted from LibriSpeech. nonverbal.wav: Non-verbal sounds collected from FSD50K and newly crawled from FreeSound. speech.wav: Mixture of reading style speech and non-verbal sounds. music.wav: Background music extracted from FMA (medium). foreground.wav: Foreground effect sounds collected from FSD50K. background.wav: Background effect sounds collected from FSD50K. sfx.wav: Foreground and background effect sounds. annots.csv: A metadata file that identifies sources of sounds. Citation @inproceedings{hasumi25_interspeech, title= {{DnR-nonverbal: Cinematic audio source separation dataset containing non-verbal sounds}}, author={Takuya Hasumi and Yusuke Fujita}, year= {2025}, booktitle = {Interspeech 2025}, pages= {4993--4997}, doi= {10.21437/Interspeech.2025-1148}, issn={2958-1796},}

Keywords

non-verbal sound, audio source separation, cinematic audio source separation

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average