
This is the CM1-Dataset designed for the evaluation of information extraction from historical documents with Large Vision Language Models. Paper: https://arxiv.org/abs/2505.04214 GitHub: https://github.com/fabiwo6/cm1 Abstract The automatic extraction of key-value information from handwritten documents is a key challenge in document analysis. A reliable extraction is a prerequisite for the mass digitization efforts of many archives. Large Vision Language Models (LVLM) are a promising technology to tackle this problem especially in scenarios where little annotated training data is available. In this work, we present a novel dataset specifically designed to evaluate the few-shot capabilities of LVLMs. The CM1 documents are a historic collection of forms with handwritten entries created in Europe to administer the Care and Maintenance program after World War Two. The dataset establishes three benchmarks on extracting name and birthdate information and, furthermore, considers different training set sizes. We provide baseline results for two different LVLMs and compare performances to an established full-page extraction model. While the traditional full-page model achieves highly competitive performances, our experiments show that when only a few training samples are available the considered LVLMs benefit from their size and heavy pretraining and outperform the classical approach. Annotations cm1_cover_*.json: "document_id": [{"Name": "last_name_person_1", "Vorname": "first_name_person_1", "Geb-Dat": "birth_date_person_1"}, {"Name": "last_name_person_2", "Vorname": "first_name_person_2", "Geb-Dat": "birth_date_person_2"}], cm1_namedate_*.txt cluster_id/document_id.jpg first_name last_name birth_date
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
