
Datasets for readability and text simplicity evaluation in three sizes: 94, 300, 3000 and 160 disjunctive data entries. One data entry contains the following information: Text_original: Text from a parallel corpus for text simplification Text_formatted: Text_original where formatting issues have been resolved either manually (ARTS94) or automatically (ARTS300, ARTS3000, ARTS160) Dataset: Parallel corpus for text simplification, from which the original text has been extracted Label: information, if the text has been from the simplified (simp) or source (src) part of the corpus ID: Unique ID Score: Simplicity/readability score of the formatted text, between 0 and 1, the higher a score, the more complex/less readable the text Licenses of the different datasets apply for the respective texts.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
