<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
Overview This repository contains the SED-Augmented SFX dataset (ASFX-SED) used in the paper [FLAM: frame-wise language-audio modeling](https://arxiv.org/abs/2505.05335). The dataset is designed for research and development in open-set sound event detection, and can also be used in event separation and other related machine learning tasks. - **Original Source:** Adobe Audition sound effects dataset (https://www.adobe.com/products/audition/offers/adobeauditiondlcsfx.html). The same dataset is also used in other audio research (e.g. https://arxiv.org/abs/2308.09089).- **Format:** Parquet (tabular metadata) + JSON (per-sample metadata) + WAV (audio files)- **License:** ADOBE RESEARCH LICENSE (see LICENSE.md) Dataset Structure ```├── asfx_sed_metadata.parquet # Metadata (Parquet)├── asfx_sed/ # Dataset folder│ ├── 0000000.json # Per-sample metadata (JSON)│ ├── 0000000_mix.wav # Mixed audio│ ├── 0000000_event_0.wav # Event audio│ └── ...``` All audio files are mono with a 48kHz sample rate. Parquet File (`asfx_sed_metadata.parquet`) Each row corresponds to a single audio sample. The following fields are included: events (list): List of event dicts before RMS relabeling (see below) background (dict): Background audio metadata background_caption (str): Description of the background audio events_loudness (list): Loudness values for each event (in dB) before RMS relabeling events_caption (list): Caption for each event events_ucs_category (list): UCS category for each event (https://universalcategorysystem.com/) events_caption_range (list): Start and end times for each event occurrence, in seconds events_id (list): Event IDs id (str): Unique sample ID for mixture RMS relabeling: During dataset synthesis, we analyze the RMS (root mean square) energy of each event to identify and relabel silent segments as negative examples. As a result, a single original event may be split into two or more events after relabeling. The "events" and "events_loudness" fields contain metadata for each event before RMS relabeling, while "events_caption", "events_ucs_category", "events_caption_range", and "events_id" correspond to each event after relabeling. If an event is split into multiple segments, the lists in these latter fields will be longer than those in the former. Example of an `events` entry (list of dicts): ```[ { "id": "...", "sample_rate": 48000, "wav": "...wav", "duration": 1.23, "caption": "...", "ucs_category": "...", "start_time": 0.0, "end_time": 1.23 }, ...]``` Example of a `background` entry (dict): ```{ "id": "...", "sample_rate": 48000, "wav": "...wav", "duration": 90.1, "caption": "...", "ucs_category": "..."}``` JSON Files Each JSON file in `asfx_sed/` contains the same fields as a row in the Parquet file, but for a single sample. The corresponding audio files are in the same folder. Usage Example Loading the Parquet Metadata ```pythonimport pandas as pdmetadata = pd.read_parquet('asfx_sed_metadata.parquet')print(metadata.head())``` Accessing Audio and JSON ```pythonimport jsonwith open('asfx_sed/0000000.json', 'r') as f: sample = json.load(f)print(sample['background_caption'])``` PyTorch DataLoader Example A simple PyTorch `Dataset` and `DataLoader` for this dataset is provided in `dataloader_example.py`. Example Usage ```pythonfrom dataloader_example import ASFX_SED_Datasetfrom torch.utils.data import DataLoader dataset = ASFX_SED_Dataset( parquet_path='asfx_sed_metadata.parquet', audio_dir='asfx_sed/')dataloader = DataLoader(dataset, batch_size=8, shuffle=True) for batch in dataloader: print(batch['id']) print(batch['background_caption']) # batch['audio'] is a list of numpy arrays (waveforms) break``` See the code and comments in `dataloader_example.py` for details on how to customize loading, audio processing, and batching. Citation If you use this dataset in your research or find it helpful, please cite the following paper: ```@inproceedings{wu2025flam,title={{FLAM}: Frame-Wise Language-Audio Modeling},author={Yusong Wu and Christos Tsirigotis and Ke Chen and Cheng-Zhi Anna Huang and Aaron Courville and Oriol Nieto and Prem Seetharaman and Justin Salamon},booktitle={Forty-second International Conference on Machine Learning},year={2025},}``` --- **Contact:** Yusong Wu (wu.yusong@mila.quebec), Justin Salamon (salamon@adobe.com) **License:** ADOBE RESEARCH LICENSE (see LICENSE.md)
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |