
PoeTree (Poetry Treebanks) is a dataset comprising over 330,000 poems / 89,000,000 tokens in nine languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Spanish, Slovenian, and Russian). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata and converted into a unified JSON structure (schema available at https://versologie.cz/poetree/json-schema). cs (~80k poems) derived from Corpus of Czech Verse de (~74k poems) derived from Metricalizer and Deutsches Lyrik Korpus en (~40k poems) based on texts from Project Gutenberg es (~9k poems) derived from Corpus of Spanish Golden-Age Sonnets and Diachronic Spanish Sonnet Corpus fr (~18k poems) derived from Malherbə hu (~13k poems) derived from ELTE Poetry Corpus it (~40k poems) derived from Biblioteca Italiana pt (~5k poems) derived from Poemas ru (~45k poems) derived from Corpus of Russian Poetry sl (~5k poem) based on texts from wikisource new in v. 0.0.2: PoeTree.sl added PoeTree.de enriched with Deutsches Lyrik Korpus
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
