<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
This repository contains the corpus necessary for the synthetic data generation of the DANIEL which is available on GitHub and described in the paper DANIEL: a fast document attention network for information extraction and labelling of handwritten documents, authored by Thomas Constum, Pierrick Tranouez, and Thierry Paquet (LITIS, University of Rouen Normandie). The paper has been accepted for publication in the International Journal on Document Analysis and Recognition (IJDAR) and is also accessible on arXiv. This project is licensed under a custom Research Usage Only (RUO) license. Please refer to the license file LICENSE for more details. The contents of this archive should be extracted into the outputs/ directory of the DANIEL codebase. Each folder in the archive follows the naming convention: daniel__strategy_X, where: refers to the target dataset used during training, strategy_X refers to the specific training strategy applied to obtain the corresponding model weights. For a detailed explanation of the training strategies, please refer to the DANIEL paper. Selecting pre-trained weights for transfer learning When performing transfer learning, choosing the right pre-trained weights is crucial for achieving optimal results. Below are the recommended weight options based on your dataset and annotation availability: 1. daniel_iam_ner_strategy_A_custom_split Training Data: Trained on all synthetic datasets and real datasets except M-POPP. Best Use Case: Suitable when only a small amount of annotated data is available in the target dataset. Attention Granularity: 32-pixel vertical granularity, meaning the encoder’s output feature map has a height of H/32 (where H is the input image height). 2. daniel_multi_synth Training Data: Trained exclusively on synthetic datasets excluding M-POPP, with no real data. Used to initialize fine-tuning strategies A and B for the IAM/IAM NER, RIMES 2009, and READ 2016 datasets. Best Use Case: Suitable for modern document datasets with several thousand annotated pages. Attention Granularity: 32-pixel vertical granularity (H/32). Citation Request If you publish material based on this weights, we request you to include a reference to the paper: « Constum, T., Tranouez, P. & Paquet, T., DANIEL: a fast document attention network for information extraction and labelling of handwritten documents. IJDAR (2025). https://doi.org/10.1007/s10032-024-00511-9 »
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |