MEG Audiovisual Matrix-Sentence Dataset (using Natural, Avatar, Degraded and Still-Image Version of the Speaker)

Riegel, Jasmin; Schüller, Alina; Reichenbach, Tobias

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset . 2026

License: CC BY

Data sources: ZENODO

MEG Audiovisual Matrix-Sentence Dataset (using Natural, Avatar, Degraded and Still-Image Version of the Speaker)

Research datakeyboard_double_arrow_right Dataset 15 Jan 2026 English Publisher:Zenodo

Authors: Riegel, Jasmin; Schüller, Alina; Reichenbach, Tobias;

doi: 10.5281/zenodo.18258659

MEG Audiovisual Matrix-Sentence Dataset (using Natural, Avatar, Degraded and Still-Image Version of the Speaker)

- Summary
- Subjects

Abstract

Audiovisual Stimuli We used German 5-word Matrix Sentences derived from the Oldenburger Satztest. We video-recorded a speaker speaking 571 (originally 600, after checking for pronunciation and disturbing sounds, 571 were left) different randomly generated sentences (frame rate: 29.97 fps, audio sampling rate: 44.1 kHz). We then processed the video clips into 4 different audiovisual conditions: Natural Video: The natural video of the speaker. Degraded Video: A visually degraded version of the videos using FFmpeg’s edgedetect filter: ffmpeg -i input.mp4 -vf "edgedetect=low=0.1:high=0.4" output.mp4 Avatar: An avatar of the speaker generated using software from the company D-iD, which used a CNN-based image encoder to process a still image of the talker and a GAN image-to-video model to animate lip movements in sync with the input audio (https://www.d-id.com/). Still Image: A still image of the speaker combined with the audio track. Experimental Design Each participant took part in two measurement sessions. In both sessions, sentences with different visual stimuli were presented with a four-talker babbling noise at -4dB SNR. After each audiovisual sentence, the participants repeated what they had understood. After each visual-only sentence, the participants repeated the name they had lip-read. The sessions were structured as follows: Session 1: SRT50 measurement with 80 audio-only sentences. (Data not included due to storage limitations --> available upon request) 1. Audiovisual block: three random sentences of each av-condition in random order --> 12 sentences 1. Visual-only block: three random sentences of each v-only condition in random order --> 9 sentences 2. Audiovisual block 2. Visual-only block … 8. Audiovisual block 8. Visual-only block Session 2: 1. Audiovisual block 1. Visual-only block … 12. Audiovisual block 12. Visual-only block MEG and Behavioral Data Structure MEG data of 32 participants is contained in this data set. Each participant has a directory “participants/px”. In the participant folder, you can find a "px_overview.csv" file and a folder with all the meg data “participants/px/meg_data”. The overview file contains the sentence presentation order and the behavioral data. It is structured as follows: Column: numbers the sentences in the order they were presented in the measurement sessions (i). Sentences i = 0 – 167 were presented in session 1. Sentences i = 168 – 419 were presented in Session 2. Column: provides the visual condition of the presented sentence (natural, degraded, avatar, still_image). Column: lists the sentence id (1 – 571) of the presented sentence (corresponding to the id in the stimuli directories). Column: is 1 if the audio was audible for the participant (audiovisual stimuli) and 0 if it wasn’t (visual-only stimuli). Column: is 1 if the presented name was understood/lip-read correctly Column: is 1 if the presented verb was understood correctly Column: is 1 if the presented amount was understood correctly Column: is 1 if the presented adjective was understood correctly Column: is 1 if the presented subject was understood correctly For visual-only stimuli, columns 6–9 are always 0, as participants only repeated the name. The “participants/px/meg_data” directory contains an “i-raw.fif” file for each of the 420 presented sentences. The files can be loaded with the MNE-library as follows: meg = mne.read_raw_fif(“…/1-raw.fif“) #The data can be accessed: meg_data = meg.get_data() #The info file can be accessed: meg_info = meg.info The “participants/" directory additionally contains: “participants/participants_overview.csv": overview of the age and sex of the participants. "read_me.txt": containing information on individual missing sentences of individual participants Stimuli Data Structure The “stimuli” directory contains a folder for each audiovisual stimuli condition (avatar, degraded, natural, still_image). Additionally, the mp3 file of each sentence is in the folder “stimuli/mp3_files”. Each of the five folders contains a version of each sentence. Each stimulus/sentence file is identifiable by its sentence ID, which ranges from 1 to 571. Technical Details The meg data were recorded at the University Hospital in Erlangen, Germany. The system is a 248 magnetometer system (4D Neuroimaging, San Diego, CA, USA). The video signal was presented via a beamer outside of the shielded chamber. The video was displayed on a screen above the participant via mirrors. The Audio signal was transmitted via 2 m-long, 2 cm-diameter tubes, resulting in a 6 ms delay. The stimuli used in the experiment were corrected for this delay. The stimuli provided here are with original alignment (not corrected for the setup-specific 6ms delay). The attended sentence and the babbling noise were presented on both ears diotically with a sound pressure level of 68 dB(A). Processing of meg data Three meg channels were removed from all measurement data as they were broken and show no signal. The data were analog-filtered from 1.0 to 200 Hz. It was offline-filtered using a notch filter (Firwin, 0.5 Hz bandwidth) at power-line frequencies (50, 100, 150, 200 Hz). The data were then resampled from 1017.25 Hz to 1000 Hz. Alignment of audio and meg data The meg data are cut into sentence-long snippets aligned with the mp3 files. Load a mp3 file and resample it to 1000 Hz. Then load a meg file corresponding to the same sentence id. The two loaded instances should now have the same shape. You can use librosa to load and resample the mp3 file: audio_data, sr = librosa.load(audio_path, sr=None) audio_data = librosa.resample(audio_data, orig_sr=sr, target_sr=1000) Paper to cite when using this data Riegel et al., “Talking avatars can differentially modulate cortical speech tracking in the high and in the low delta band” (https://doi.org/10.64898/2026.01.07.695461) Example Code Example code on how to compute Temporal Response Functions and predictor variables is provided in a repository by Alina Schüller: https://github.com/Al2606/MEG-Analysis-Pipeline

Related Organizations

University of Erlangen-Nuremberg
Germany

Keywords

Speech Comprehension, Audiovisual, Magnetoencephalography, Avatar, Oldenburger Satztest

Found an issue? Give us feedback

Related to Research communities

Neuroscience