Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
Data sources: Datacite
versions View all 2 versions
addClaim

MHG4SNA: Middle high german texts annotated for social network analysis

Authors: NoraKetschik;

MHG4SNA: Middle high german texts annotated for social network analysis

Abstract

Description This corpus contains multiple middle high german texts with annotations for social network analysis. It contains annotations of: named entities and entity mentions (including partial coreference resolution), direct speech, narrator's comments. See below for further description. The annotated texts are part of my dissertation on social network analysis of arthurian romances. The research was developed in the context of the DH center CRETA at the University of Stuttgart. Texts Wolfram von Eschenbach: 'Parzival', in: Wolfram von Eschenbach: Werke, ed. by Karl Lachmann, 5th edition, Berlin 1891, pp. 11–388. Hartmann von Aue: 'Erec', ed. by Albert Leitzmann continued by Ludwig Wolff, 7th edition by Kurt Gärtner, Tübingen 2006 (Altdeutsche Textbibliothek 39). Hartmann von Aue: 'Iwein', ed. by G. F. Benecke and K. Lachmann, revised by Ludwig Wolff, 7th edition, part 1: Text, Berlin 1968. Wolfram von Eschenbach: 'Willehalm', in: Wolfram von Eschenbach: Werke, ed. by Karl Lachmann, 5th edition, Berlin 1891, pp. 421–640. 'Das Rolandslied des Pfaffen Konrad', ed. by Carl Wesle, 3rd edition by Peter Wapnewski, Tübingen 1985 (Altdeutsche Textbibliothek 69). All texts are part of the MHDBDB (Mittelhochdeutsche Begriffsdatenbank). Annotations The texts contain annotations of different categories, as described in the following sections. 1. Named Entities and Entity Mentions I annotated all namend entities and entity mentions that belong to the categories PER and LOC. PER stands for 'person' and refers to real persons as well as fictional characters. LOC stands for 'location' and includes real and fictional places. I annotated named entities (e.g. 'Parzival' as PER or 'Nantes' as LOC) as well as entity mentions referring to an instance of PER or LOC (e.g. 'the knight' for Parzival, or 'the city' for Nantes). I did not annotate pronouns. Entity references can contain multiple words, e.g. 'the lovely queen Ginover', and they can be nested, e.g. '[the son of [the king Gahmuret]]'. The annotations follow the guidelines created for multiple categories and disciplines in the context of CRETA. They are published here. 2. Entity Grounding All annotated entity references are mapped to the entity instance that they refer to. E.g. the refences 'Parzival', 'Herzeloyde's son', 'the young man', 'the red knight' etc. all refer to the character instance 'Parzival'. The entity grounding takes into consideration the context of the entity mentions since one and the same expression can refer to different instances (in one context 'the king' refers to Arthur, in another context to Gahmuret) 3. Direct Speech (DS) Passages of direct speech have been annotated by detecting quotation marks. They are tagged as 'DS'. There are a few cases of embedded direct speech (passages of direct speech containing another passage of direct speech); these cases are annotated as well. 4. Narrator's comments (EK) As additional category I annotated passages that contain statements of the narrator, narrator's comments, extensive descriptions or digressions (e.g. an excursus to a specific topic). These passages are not part of the fictional world or lead to a pause in the timeline of events. The are annotated as 'EK' ('EK': passages that aren't part of the diegesis, 'EK2': passages that lead to a pause, e.g. comments or descriptions). 5. Segmentation The texts are subdivided in passages of 30 verses. Since some text's editions ('Parzival', 'Willehalm') contain a formal segmentation in passages of 30 verses each, the same kind of segmentation has been transfered to the other texts. This means 'segment 1' contains the first 30 verses, 'segment 2' contains verses 31-60 and so on. According to the editions by Lachmann, 'Parzival' and 'Willehalm' are also subdivided in chapter-like books (Parzival: book 1 to 16, Willehalm: book 1 to 9). The other texts are similarly subdivided in chapter-like sections following common content-based divisions. Social Network Analysis The data can be used to explore and analyse the social network of the texts. SNA can be performed via gephi [4] using the gefx files. The social network is based on co-occurrences using a) the annotated and grounded entities, and b) the text segmentation in segments of 30 verses each. A relation between two or more entities is extracted whenever they co-occur in a segment. Data downloads The annotated texts can be downloaded in multiple formats: conll, csv, and gexf. 1. Conll The files contain seven columns: (1) token, (2) POS-tag, tagged using a middle high german pos tagger, (3) number of segment, (4) Entity reference annotation indicating the intance that the entity reference refers to. '-' if there is no entity reference, (5) EK: '1' in case there is an annotation of 'EK', '0' if not, (6) EK2: '1' in case there is an annotation of 'EK2', '0' if not, (7) DS: '1' if the token is tagged as direct speech, '0' if not. 2. Csv The csv files contain all annotations of the category PER including entity grounding. The files contain the following columns: begin and end (start and end of the entity reference expression, character offset), doc_id (document id), buch (book number), quote (entity reference expression), coref (the entity instance that the expression refers to), overlap (indicates if there is an overlap, relevant for embedded entities), ek and ek2 (narrator's comment), ds (direct speech), space (annotations of the space where the story takes action, can be ignored here), segnr (number of segment), em (embedded), klasse (entity class), xrange (technical, relevant for annotation view). 3. Gexf These files can be used to import the data to gephi. It is based on the annotation and grounding of entities (categorie PER). A relation between entities is based on co-occurrence (whenever two or more entities co-occur in a segment, they have a relation; with more relations, the intensitiy of their relation grows). The text segmentation is described above. Embedded entities are excluded. Entities mentioned in direct speech (DS) or in comments (EK) can optionally be selected or deselected. These optional filters are indicated in the name of the files. To visualize the graph dynamically, one can use the text segmentation as timeline. release v1.0.0: data publication in the context of my dissertation.

Keywords

annotation, social network analysis, middle high german, DH

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 25
    download downloads 4
  • 25
    views
    4
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
25
4