Speech annotations of unscripted monologues

Name: Speech annotations of unscripted monologues
Keywords: Speech Transcription, Computational Linguistics, Spontaneous Speech, FOS: Languages and literature, Linguistics

Jaeger, Manuela; Bleichner, Martin

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset . 2025

License: CC BY

Data sources: ZENODO

ZENODO

Dataset . 2025

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2025

License: CC BY

Data sources: Datacite

Speech annotations of unscripted monologues

Research datakeyboard_double_arrow_right Dataset 25 Nov 2025 German Publisher:ZenodoFunded by:DFG | unidentified

Authors: Jaeger, Manuela; Bleichner, Martin;

doi: 10.5281/zenodo.17578693 , 10.5281/zenodo.17578692

Speech annotations of unscripted monologues

- Summary
- Subjects
- Metrics

Abstract

Contact information: manuela.jaeger@uni-oldenburg.de, martin.bleichner@uni-oldenburg.de Context: This repository contains precisely time-resolved linguistic speech annotations derived from a published audiovisual dataset of unscripted monologues (Daeglau et al., 2023; https://doi.org/10.5281/zenodo.8082844). The annotations capture linguistic features on a word and phoneme level aligned at fine temporal resolution, enabling accurate mapping between speech content and its temporal structure. This speech annotation dataset provides a robust foundation for investigating the dynamics of natural speech in neurolinguistic research. File naming convention: Each folder contains files with a common overall structure to allow a direct link between the audiovisual recordings dataset and the corresponding linguistic speech annotations. For each subject and take the data are stored in separate files. The file naming convention is as follows: Subject__Take_. subject_ID: unique integer number for each subject ranging from 1 to 6. take_num: running integer index with leading zeros (01, 02, …) indicating the take number. add_feature: in some takes an additional feature was introduced. “MG” indicating when babble noise was played over in-ear phones “LS” indicating that the subject wore lipstick during the take “BR” indicating that the subject wore glasses during the take extension: taking the value of .json or .tsv Data description: Each folder contains an _event.tsv file, a spreadsheet that describes the time-resolved linguistic annotation at the word and phoneme level. The structure of this event table is as follows: Column name (Variable) Description onset Onset of the event in seconds. duration Duration of the event in seconds. event_type The main category of the event."pause" - onset of a silent interval"word" - onset of a word"phoneme" - onset of a phoneme mau Phonetic segmentation of the event in XSampa. mas Phonetic segmentation of the event in XSampa including syllable segmentation, the point “.” marks a new syllable. kan Canonical phonological transcript (standard pronunciation) of the event in XSampa. kas Canonical phonological transcript (standard pronunciation) of the event in XSampa including syllable segmentation, the point “.” marks a new syllable. ort Orthographic representation of the event. rftag Part-of-speech Tags based on RFTagger. lemma Base form or dictionary form of the word. token Word index starting from 1. It's important to note that the onset information of the event is given relative to the start of the audiovisual recording, while the duration information is given relative to the onset of the event. The _event.json file is a sidecar that describes the categorical and value columns of the event spreadsheet in json-format. Additionally, we included in each folder a _HED.tsv event file. It contains the same event information represented in the _event.tsv file but in machine-actionable HED format. HED (Hierarchical Event Descriptors) is a framework for annotating events in time-series data using a structured standardised vocabulary.

Related Organizations

Carl von Ossietzky University of Oldenburg
Germany

Keywords

Speech Transcription, Computational Linguistics, Spontaneous Speech, FOS: Languages and literature, Linguistics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Funded by

DFG| unidentified