A multi-speaker dataset of real-time two-dimensional speech magnetic resonance images with articulator ground-truth segmentations

Name: A multi-speaker dataset of real-time two-dimensional speech magnetic resonance images with articulator ground-truth segmentations
Keywords: Image segmentation, Magnetic resonance imaging, Real-time imaging, Articulators, Speech, Velopharyngeal closure, Deep learning, Ground-truth segmentation, MRI

Ruthven, Matthieu; Peplinski, Agnieszka; Miquel, Marc

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset . 2023

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2023

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2023

License: CC BY

Data sources: Datacite

A multi-speaker dataset of real-time two-dimensional speech magnetic resonance images with articulator ground-truth segmentations

Research datakeyboard_double_arrow_right Dataset 01 Jan 2023 English Publisher:Zenodo

Authors: Ruthven, Matthieu; Peplinski, Agnieszka; Miquel, Marc;

doi: 10.5281/zenodo.7595163 , 10.5281/zenodo.10046815 , 10.5281/zenodo.7595164

A multi-speaker dataset of real-time two-dimensional speech magnetic resonance images with articulator ground-truth segmentations

- Summary
- Subjects
- Metrics

Abstract

Summary This dataset consists of real-time magnetic resonance images of speech and corresponding ground-truth (GT) segmentations and velopharyngeal closure labels. Images The images are of five healthy adult volunteers (two females, three males; age range 24-28 years) counting a single time from 1 to 10 in English. Each volunteer was imaged in a supine position using a 3.0 T TX Achieva magnetic resonance imaging (MRI) scanner and a 16-channel neurovascular coil (both Philips Healthcare, Best, Netherlands). Images of a 10 mm thick midsagittal slice of the head were acquired using a steady state free procession (SSFP) pulse sequence based on the sequence identified by [1] as being optimal for vocal tract image quality. The acquired matrix size and in-plane pixel size were 120×93 and 2.50×2.45 mm2 respectively. However, k-space data were zero padded to a matrix size of 256×256 by the scanner before being reconstructed, resulting in a reconstructed in-plane pixel size of 1.17×1.17 mm2. Images were acquired at a temporal resolution of 0.1s and one image series was acquired per volunteer. The volunteers were instructed to perform the speech task at a rate which they considered to be normal. Some performed the task faster than others and consequently not all series had the same number of images. The series have 105, 71, 71, 78 and 67 images each (392 images in total). Velopharyngeal closure labels Each image was visually inspected and labelled as either showing contact between the soft palate and posterior pharyngeal wall or not showing contact. A label of 1 indicates contact, while a label of 0 indicates no contact. To reduce the subjectivity of the labels, each image was independently labelled by four MRI Physicists with four, ten, two and one years of speech MRI experience, and the majority label was chosen as the GT label. Ground-truth segmentations GT segmentations were created by manually labelling pixels in each of the images. The segmentations consisted of six classes, each made up of one or more anatomical features. There was no overlap between classes: a pixel could not belong to more than one class. For conciseness, the classes were named as follows: head, soft palate, jaw, tongue, vocal tract and tooth space. However, the names of the head, jaw and tongue classes are simplifications. The head class consisted of all anatomical features superior to or posterior to the vocal tract. It therefore included the upper lip, hard palate, brain, skull, posterior pharyngeal wall and neck. The jaw class consisted of the lower lips, the soft tissue anterior to and inferior to the mandible and the soft tissue inferior to the tongue. The tongue class included the epiglottis and the hyoid bone. Pixels not labelled as belonging to one of the classes were considered to belong to the background. GT segmentations were created by the MRI Physicist with four years of speech MRI experience. Dataset structure Images are contained in the MRI_SSFP_10fps folder. Within this folder, each subfolder contains the images of a different volunteer. Each image is saved as a separate DICOM file with name image_N.dcm. Velopharyngeal closure labels are saved in velopharyngeal_closure.xslx. The labels of each volunteer are saved in different sheets. The spreadsheet row corresponds to the image number (i.e. the label in row 1 is the label for image 1). Ground-truth segmentations are contained in the GT_Segmentations folder. Within this folder, each subfolder contains the GT segmentations of a different volunteer. Each GT segmentation is saved as a separate MAT file with name mask_N.mat. In each MAT file, pixels with the following values correspond to the following class: 0 = background 1 = head 2 = soft palate 3 = jaw 4 = tongue 5 = vocal tract 6 = tooth space References [1] A.D. Scott, R. Boubertakh, M.J. Birch, M.E. Miquel, Towards clinical assessment of velopharyngeal closure using MRI: evaluation of real-time MRI sequences at 1.5 and 3 T, Br. J. Radiol. 85 (2012) e1083–e1092. https://doi.org/10.1259/bjr/32938996. Citation Please cite this work using the following: Ruthven M, Peplinski A, Adams D, King AP, Miquel ME (2023) Real-time speech MRI datasets with corresponding articulator ground-truth segmentations. Scientific Data, under revision, manuscript SDATA-23-01169.

Funded by NIHR Grant Reference Number: ICA-CDRF-2018-04-ST2-032 and Barts Health Charity Grant Reference Number: MGU0600

Related Organizations

King's College London
United Kingdom
Barts Health NHS Trust
United Kingdom
Guy's and St Thomas' NHS Foundation Trust
United Kingdom
Queen Mary University of London
United Kingdom
St Thomas' Hospital
United Kingdom

View all View all

Keywords

Image segmentation, Magnetic resonance imaging, Real-time imaging, Articulators, Speech, Velopharyngeal closure, Deep learning, Ground-truth segmentation, MRI

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average