ARTS Datasets - ARTS94, ARTS300, ARTS3000, ARTS160

Datasets for readability and text simplicity evaluation in three sizes: 94, 300, 3000 and 160 disjunctive data entries. One data entry contains the following information: Text_original: Text from a parallel corpus for text simplification Text_formatted: Text_original where formatting issues have been resolved either manually (ARTS94) or automatically (ARTS300, ARTS3000, ARTS160) Dataset: Parallel corpus for text simplification, from which the original text has been extracted Label: information, if the text has been from the simplified (simp) or source (src) part of the corpus ID: Unique ID Score: Simplicity/readability score of the formatted text, between 0 and 1, the higher a score, the more complex/less readable the text Licenses of the different datasets apply for the respective texts.

Related Organizations

Leibniz Association
Germany
Technische Hochschule Mittelhessen
Germany
TH Köln – University of Applied Sciences
Germany
Herder Institute
Germany

1 Research products, page 1 of 1

ARTS Datasets - ARTS94, ARTS300, ARTS3000
2024HasVersion

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

Hessian Open Science Portal