Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

DARE Database

Authors: Dahl, Christian Møller; Sørensen, Emil; Wittrock, Simon Friis; Westermann, Christian; Johansen, Torben Skov Dyg;

DARE Database

Abstract

The DARE Database is a set of handwritten character dates derived from different historical sources from Sweden and Denmark. Additional details are available on our GitHub and on arXiv. There are seven splits provided in this dataset representing the different data sources. Each folder contains the respective minipics and their labels split into test and training files. The number of files and tokens are: Train images: 2,876,752Test images: 152,414Total number of images: 3,029,166Total number of tokens: 9,682,027 Which is further explained in the following table: Datasets Sequence Training Observations Test Observations Death Certificates (1) DD-MM-YYYY 11,627 1,000 Death Certificates (2) DD-MM-YYYY 155,439 8,338 Police Records (1) DD-MM-YY 1,006,199 53,488 Police Records (2) DD-MM-YY 326,478 17,103 Swedish Records Birth Dates DD-MM-YY 597,756 31,389 Swedish Records Death Dates DD-MM 547,813 28,803 Funeral Records DD-MM 231,440 12,293 Note that for data restriction reasons, the CIHVR images are excluded (as we do not have permission to publicly share those). The only exception to our images consisting purely of digits arise from the month in the date sequences which sometimes is written with alphabetic characters, e.g., "February" or "Feb". The original images are acquired from Copenhagen Archives, the National Archives of Denmark, and Lund University. The minipics are created using Coherent Point Drift to extract the regions of interest from the source documents. One comment about the Swedish cause of death records is that a lot of these are labelled as either empty or partly empty. Partly empty, e.g., ' 29-" ' represents that the cell with respect to the month is in fact not empty but rather that the month is the same as above. It is quite common in many historical tabulated records that they use a special mark for notating the same as above. The other cells labelled as ' ,-,-, ' for birth dates or ' ,-, ' for death dates are completely empty cells and could be excluded for pure digit recognition models. However, for transcribing historical records, empty cells are frequently represented and should be taken into account one way or another.

Country
Denmark
Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities