publication . Article . Other literature type . 2021

Converting raw transcripts into an annotated and turn-aligned TEI-XML corpus: the example of the Corpus of Serbian Forms of Address

Dolores Lemmenmeier-Batinić;
Open Access English
  • Published: 01 Jul 2021
  • Publisher: Ljubljana University Press
  • Country: Switzerland
Abstract
This paper describes the procedure of building a TEI-XML corpus of spoken Serbian starting from raw transcripts. The corpus consists of semi–structured interviews, which were gathered with the aim of investigating forms of address in Serbian. The interviews were thoroughly transcribed according to GAT transcribing conventions. However, the transcription was carried out without tools that would control the validity of the GAT syntax, or align the transcript with the audio records. In order to offer this resource to a broader audience, we resolved the inconsistencies in the original transcripts, normalised the semi-orthographic transcriptions and converted the cor...
Subjects
free text keywords: Institute of Slavonic Studies, 490 Other languages, 410 Linguistics, Linguistics and Language, Language and Linguistics, spoken Serbian, language biographical interviews, forms of address, data re-usability, Philology. Linguistics, P1-1091
Related Organizations
Any information missing or wrong?Report an Issue