Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Conference object . 2014
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2014
License: CC BY
Data sources: ZENODO
versions View all 1 versions
addClaim

Merging Data, The Essence Of Creation Of Multi-Layer Corpora

Authors: Zipser, Florian; Frank, Mario; Schmolling, Jakob;

Merging Data, The Essence Of Creation Of Multi-Layer Corpora

Abstract

{"references": ["Dipper S. (2005). XML-based Stand-off Representation and Exploitation of Multi-Level", "Linguistic Annotation. In: Eckstein R., Tolksdorf R. (eds.) Berliner XML Tage.", "Ide N.& Suderman K.(2007). GrAF: A Graph-based Format for Linguistic Annotations. In:", "Proceedings of the Linguistic Annotation Workshop, Prague, Czech Republic.", "Schmid, H. (1995). Improvements in Part-of-Speech Tagging with an Application to German.", "Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland.", "Stede M. (2004). The Potsdam commentary corpus. In Proceedings of the 2004 ACL", "Workshop on Discourse Annotation (DiscAnnotation '04), Bonnie Webber and Donna Byron", "(Eds.). Association for Computational Linguistics, Stroudsburg, PA, USA, 96-102.", "Telljohann, H./Hinrichs, E. W./K\u00fcbler, S./Zinsmeister, H./Beck, K. (2009). Stylebook for the", "T\u00fcbingen Treebank of Written German (T\u00fcBa-D/Z). Universit\u00e4t T\u00fcbingen Seminar f\u00fcr", "Sprachwissenschaft.", "Reznicek, M.; L\u00fcdeling, A.; Krummes, C.; Schwantuschke, F.; Walter, M.; Schmidt, K.;", "Hirschmann, H.; Andreas, T. (2012). Das Falko-Handbuch. Korpusaufbau und Annotationen", "Version 2.01", "Zeldes, Amir, Ritz, Julia, L\u00fcdeling, Anke & Chiarcos, Christian (2009). \"ANNIS: A Search", "Tool for Multi-Layer Annotated Corpora\". In: Proceedings of Corpus Linguistics 2009, July", "20-23, Liverpool, UK.", "Zipser F., Romary L. (2010). A model oriented approach to the mapping of annotation formats", "using standards In: Proceedings of the Workshop on Language Resource and Language", "Technology Standards, LREC 2010. Malta. URL:", "http://hal.archives-ouvertes.fr/inria-00527799/en/", "Zipser F., Zeldes A., Ritz J., Romary L. & Leser U. (2011). Pepper: Handling a multiverse of", "formats 33. Jahrestagung der Deutschen Gesellschaft f\u00fcr Sprachwissenschaft. G\u00f6ttingen,", "23.- 25. Februar 2011"]}

The last couple of years have shown an increasing number of multi layer corpora. Such corpora allow the analysis of phenomena spreading through multiple annotation layers, for example corpora like TueBaDZ (see: http://www.sfs.uni-tuebingen.de/ascl/ressourcen/corpora/tueba-dz.html), PCC (see: http://www.ling.uni-potsdam.de/acl-lab/Forsch/pcc/pcc.html), FALKO (see: https://u.hu-berlin.de/falko) and many other corpora contain annotations on syntactical, rhetorical, information structural and other layers. Often, annotations were created manually, or semi-automatically with different tools like EXMARaLDA (see: http://www.exmaralda.org/), TreeTagger (see: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/) and MMAX2 (see: http://mmax2.sourceforge.net/). These tools are powerful and usable but unfortunately only provide a minimum of interoperability, which impedes the creation of multi layer corpora. Thus, multiple layers of such corpora often had to be merged by hand or by very proprietary scripts implemented for just one use case and therefore could not be reused for other corpora easily. With this poster we present a tool which merges several layers of annotations into a single multi layer corpus. When creating a multi layer corpus, several analyses base on the same primary data and often also on the same tokenization. We started merging the data on the tokenization level and traversed bottom up, to merge even higher levels of annotation. This concept is implemented in a module for the converter framework Pepper (see: https://u.hu-berlin.de/saltnpepper) with use of the common meta- model Salt. By using Pepper, the merging module is able to handle all formats which can be imported by a Pepper module. Multi layered corpora then can be mapped into a multilayer formats like PAULA (Chiarcos et al. 2008), GrAF (Ide & Suderman 2007) or can be imported into ANNIS.

Keywords

ANNIS, Pepper, merging, Salt, linguistic data, linguistics, linguistic format, multi-layer corpus, conversion framework, corpora, converter

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 7
    download downloads 6
  • 7
    views
    6
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
7
6
Green