
Handwritten document layout analysis remains a challenging task due to the high variability in writing styles, page structures, and document degradations. Existing datasets often lack sufficient layout diversity, as they are typically sourced from homogeneous collections with similar structural patterns. This limitation hinders the development of robust models capable of generalizing to real-world scenarios. To address this issue, we introduce a new dataset of handwritten documents collected from Wikimedia Commons, representing a broad spectrum of historical and modern documents with varying layouts, languages, and writing conditions. Each document is annotated for layout analysis, with identified page segments and corresponding labels. While the dataset is not intended for large-scale model training, it serves as a valuable benchmark for evaluating layout analysis methods and identifying generalization challenges. By prioritizing layout diversity, this dataset provides a realistic testbed for advancing handwritten document segmentation and structural analysis, ultimately contributing to the development of more adaptable and reliable document processing systems.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
