Brain-Assisted Speech Enhancement Dataset and code USTC

Dataset Introduction This dataset was released together with the paper “Qingtian Xu, Jie Zhang, Miao Sun, Huadong Liang, Xin Li, Zhenhua Ling, Analysis of Brain-Assisted Speech Enhancement Models Incorporating Auditory Attention Decoding, IEEE Trans. Audio, Speech, Lang. Process. (TASLP), 34:3073-3086, 2026.” The official implementation code is also publicly available. If you use this dataset in your research, please cite the corresponding paper. Overview This dataset is designed for research on Auditory Attention Decoding (AAD) and Brain-Assisted Speech Enhancement (BASE) under dichotic listening conditions. The dataset contains synchronized EEG recordings and speech stimuli collected from normal-hearing subjects performing sustained auditory attention tasks. A key characteristic of this dataset is that no identical speech pairs are repeated, which helps reduce shortcut learning caused by repeated stimulus combinations and enables more rigorous evaluation of AAD-driven BASE systems. Dataset Description The dataset includes 18 normal-hearing subjects aged between 20 and 35 years. During the experiment, each subject is instructed to attend to one of two competing speakers in a dichotic listening scenario. Each subject participates in 20 trials, where: Each trial lasts 120 seconds. Subjects switch their attended speaker between two successive trials. Two speech stimuli are simultaneously presented through earphones, one for each ear. After each trial, subjects are required to answer multiple-choice comprehension questions to verify attention engagement. The recordings are conducted in a quiet office environment. During data acquisition, the computer screen in front of the subjects remains plain and free of visual distractions or additional stimuli. Speech Stimuli The speech materials consist of news recordings from Xinwen Lianbo, spoken by four native Chinese speakers and sampled at 44.1 kHz. The original speech recordings are randomly paired to construct two-speaker mixtures. For each subject, the presented speech pairs are randomly selected from combinations of two out of the four speakers. Importantly: No identical speech pairs appear in the dataset. To guarantee this property, all trials are carefully checked. Four potentially repeated trials are removed from the final release. This design improves the reliability of generalization evaluation for AAD and BASE models. EEG Recording EEG signals are recorded using the BioSemi ActiveTwo system with 64-channel electrodes at a sampling rate of 8196 Hz. For downstream AAD and BASE evaluation, the EEG data are further: Downsampled to 128 Hz Band-pass filtered between 0.1 Hz and 45 Hz Processed with Independent Component Analysis (ICA) for artifact removal File Structure README.mdDataset documentation and usage instructions. group1.xlsxExperimental paradigm records and trial metadata. test.mMATLAB preprocessing script for raw EEG recordings. The script generates .mat files used by the dataset generation pipeline. Preprocessing Pipeline The provided test.m script performs EEG preprocessing using EEGLAB. The preprocessing steps include: Loading Raw EEG Data Loads BioSemi .cdt files. Resampling Downsamples EEG signals to 128 Hz. Epoch Extraction Extracts 120-second epochs for each trial. Band-pass Filtering Applies a 0.1–45 Hz band-pass filter. Non-EEG Channel Removal Removes: HEO VEO TRIGGER EKG EMG Artifact Removal Performs ICA using runica (extended mode). Uses ICLabel for automatic component classification. Removes components labeled as: Muscle Eye Channel Noise Components with probability greater than 0.7 are automatically rejected. Re-referencing Applies average reference. Excluded Trials To ensure that no identical speech pairs exist in the released dataset, the following trials are excluded: Subject 13 — Trial 16 Subject 14 — Trial 13 Subject 16 — Trial 4 Subject 17 — Trial 18 Important Notes for AAD-Driven BASE Research For AAD-driven Brain-Assisted Speech Enhancement tasks, we strongly recommend avoiding EEG preprocessing methods based on frequency-band coupling. Such operations may alter the original linear characteristics of EEG signals and potentially make the optimization of AAD modules significantly more difficult. Citation If you use this dataset, please cite the article: @ARTICLE{11540442, author={Xu, Qing-Tian and Zhang, Jie and Sun, Miao and Liang, Huadong and Li, Xin and Ling, Zhen-Hua}, journal={IEEE Transactions on Audio, Speech and Language Processing}, title={Analysis of Brain-Assisted Speech Enhancement Models Incorporating Auditory Attention Decoding}, year={2026}, volume={34}, number={}, pages={3073-3086} }

Found an issue? Give us feedback