• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 5 versions
Publication . Conference object . 2015

TEI across corpora, languages and genres: Towards a standard for the representation of social media and computer-mediated communication

Beisswenger, Michael; Chanier, Thierry; Ehrhardt, Eric; Herold, Axel; Lüngen, Harald; Poudat, Céline; Storrer, Angelika;
Published: 28 Oct 2015
Publisher: HAL CCSD
Country: France

International audience; The panel presents results and ongoing work from corpus projects in which TEI-P5 hasbeen adopted for the representation and linguistic annotation of genres of social mediaand computer-mediated communication (CMC). It relates to the work of the TEI-SIG“computer-mediated communication” which is developing TEI models for therepresentation of CMC genres and testing these models for a broad range of genres(ranging from “text-only” genres such as chat and SMS to multimodal genres such aslearning environments and Second Life) and in corpus building initiatives for variousEuropean languages.The goal of the panel is to give an overview of models and practices in representingCMC in TEI on the example of German and French CMC corpora. A documentation andODD files of the schemas developed by the group will be made available in the TEI wikiand be announced via the TEI mailing list before the conference so that everybody whois interested in participating in the discussion can examine the CMC models in advance.The discussion in the panel shall serve as an opportunity for collecting feedback onthese models and schema drafts from a broader community within the TEI who isinterested in adapting TEI-P5 for the representation of new (digital) genres. Thisfeedback will be taken into consideration when revising the models and – as a next stepafter the conference – preparing feature requests for adapting the TEI for CMC.


TEI, Text Encoding Initiative, CMC, computer-mediated communication, corpora, [SHS.LANGUE]Humanities and Social Sciences/Linguistics

Beißwenger, Michael; Ermakova, Maria; Geyken, Alexander; Lemnitzer, Lothar; Storrer, Angelika (2012): A TEI Schema for the Representation of Computer-mediated Communication. Journal of the Text Encoding Initiative (jTEI) 3. (DOI: 10.4000/jtei.476).

Beißwenger, Michael; Ermakova, Maria; Geyken, Alexander; Lemnitzer, Lothar; Storrer, Angelika (2013): DeRiK: A German Reference Corpus of Computer-Mediated Communication. In: Literary and Linguistic Computing (LLC). [OpenAIRE]

Chanier, Thierry; Poudat, Celine; Sagot, Benoit; Antoniadis, Georges; Wigham, Ciara; Hriba, Linda; Longhi, Julien; Seddah, Djamé (2014): The CoMeRe corpus for [OpenAIRE]

4 Project „Whats Up, Deutschland“ (, initiated and coordinated by

CoMeRe (2015). CoMeRe Repository: Corpora of Computer-Mediated Communication in French. Ortolang : Nancy.

Margaretha, Eliza; Lüngen, Harald (2014): Building Linguistic Corpora from Wikipedia Articles and Discussions. In: Beißwenger, Michael; Oostdijk, Nelleke; Storrer, Angelika; van den Heuvel, Henk (Eds.): Building and Annotating Corpora of Computer-Mediated Communication: Issues and Challenges at the Interface of Corpus and Computational Linguistics. Special Issue, Journal of Language Technology and Computational Linguistics (JLCL 2/2014), 59-82.