publication . Article . Other literature type . 2021

Converting raw transcripts into an annotated and turn-aligned TEI-XML corpus: the example of the Corpus of Serbian Forms of Address

Lemmenmeier-Batinić, Dolores;
Open Access English
  • Published: 01 Jul 2021 Journal: Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave, volume 9, issue 1 (issn: 2335-2736, Copyright policy)
  • Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
  • Country: Switzerland
This paper describes the procedure of building a TEI-XML corpus of spoken Serbian starting from raw transcripts. The corpus consists of semi–structured interviews, which were gathered with the aim of investigating forms of address in Serbian. The interviews were thoroughly transcribed according to GAT transcribing conventions. However, the transcription was carried out without tools that would control the validity of the GAT syntax, or align the transcript with the audio records. In order to offer this resource to a broader audience, we resolved the inconsistencies in the original transcripts, normalised the semi-orthographic transcriptions and converted the cor...
free text keywords: spoken Serbian, language biographical interviews, forms of address, data re-usability, Philology. Linguistics, P1-1091, Linguistics and Language, Language and Linguistics, Institute of Slavonic Studies, 490 Other languages, 410 Linguistics
Related Organizations
Any information missing or wrong?Report an Issue