2 Research products, page 1 of 1

  • Open Access German
    Laurent Romary; Werner Wegstein;
    Publisher: OpenEdition
    Country: France

    Our paper outlines a proposal for the consistent modeling of heterogeneous lexical structures in semasiological dictionaries, based on the element structures described in detail in chapter 9 (Dictionaries) of the TEI Guidelines. The core of our proposal describes a system of relatively autonomous lexical “crystals” that can, within the constraints of the relevant element’s definition, be combined to form complex structures for the description of morphological form, grammatical information, etymology, word-formation, and meaning for a lexical structure. The encoding structures we suggest guarantee sustainability and support re-usability and interoperability of data. This paper presents case studies of encoding dictionary entries in order to illustrate our concepts and test their usability. We comment on encoding issues involving <entry>, <form>, <etym>, and on refinements to the internal content of <sense>.

  • Publication . Conference object . 2016
    Banski, Piotr; Gaiffe, Bertrand; Lopez, Patrice; Meoni, Simon; Romary, Laurent; Schmidt, Thomas; Peter Stadler; Witt, Andreas;
    Country: France

    International audience; The paper provides an overview of and an update on the on-going proposal to create a component within the TEI architecture. It elicits the conceptual background of having stand-off annotations embedded within a TEI document and the consequences in terms of primary source preservation, multiple annotation views and possible exporting of annotation content into autonomous TEI documents. It demonstrates the various types of possible use cases ranging from manual annotation to fully automatized information extraction processes and show the importance of implementing, right from the onset, the possibility to use any kind of internal or external vocabulary for representing annotation bodies (e.g. to deal with structural or conceptual annotations). An important prospect here is that the construct could lead to a simplified development of TEI-aware online services such as Named Entity Recognisers.We relate to on-going initiatives and show the necessity to align with the Web Annotation Data Model (W3C) as well as with the recent introduction of the element for speech transcription (as part of the work carried out in the ISO standard 24624) as an elementary annotation crystal in the sense of Romary and Wegstein (2012). In this context we tackle the issue of implicitness in the representation of annotations and open the debate related to the trade-off between having a terse vs. highly flexible model.We end up by illustrating the application that is already made of the current proposal in various projects related to data mining or scientific information, and in particular to the representation of annotated scholarly content.Further materials•Minutes of the January 2014 meeting:,%2001.2014/standoff-minutesBerlin2014.pdf•The TEI GitHub ticket: •The standOff proposal on GitHub: (branch AnnArbor)ReferencesBański Piotr (2010). Why TEI standoff annotation doesn’t quite work: and why you might want to use it nevertheless. In Proceedings of Balisage: The Markup Conference, 2010. Vol. 5 of Balisage Series on Markup Technologies ISO/DIS 24624 Language resource management -- Transcription of spoken languagePose Javier, Patrice Lopez and Laurent Romary (2014). A Generic Formalism for Encoding Stand-off annotations in TEI. 2014. Romary Laurent (2015). TEI challenges in an accelerating digital world. DiXiT Convention week, Sep 2015, The Hague, Netherlands. 2015, . Romary Laurent and Werner Wegstein (2012), « Consistent Modeling of Heterogeneous Lexical Structures », Journal of the Text Encoding Initiative [Online], Issue 3 | November 2012, Online since 15 October 2012, connection on 12 May 2016. URL : ; DOI : 10.4000/jtei.540 (section about Crystals : Annotation Data Model, W3C,

Send a message
How can we help?
We usually respond in a few hours.