DANIEL: a fast document attention network for information extraction and labelling of handwritten documents

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 09 Jan 2025Embargo end date: 01 Jan 2024 English Publisher:Springer Science and Business Media LLCJournal:International Journal on Document Analysis and Recognition (IJDAR), volume 28, pages 573-595 (issn: 1433-2833, eissn: 1433-2825,

Copyright policy )Funded by:ANR | HAISCoDe, ANR | EXO-POPP

Authors: Thomas Constum; Pierrick Tranouez; Thierry Paquet;

doi: 10.1007/s10032-024-00511-9 , 10.48550/arxiv.2407.09103

arXiv: 2407.09103

DANIEL: a fast document attention network for information extraction and labelling of handwritten documents

- Summary
- Subjects
- Metrics

Abstract

Information extraction from handwritten documents involves traditionally three distinct steps: Document Layout Analysis, Handwritten Text Recognition, and Named Entity Recognition. Recent approaches have attempted to integrate these steps into a single process using fully end-to-end architectures. Despite this, these integrated approaches have not yet matched the performance of language models, when applied to information extraction in plain text. In this paper, we introduce DANIEL (Document Attention Network for Information Extraction and Labelling), a fully end-to-end architecture integrating a language model and designed for comprehensive handwritten document understanding. DANIEL performs layout recognition, handwriting recognition, and named entity recognition on full-page documents. Moreover, it can simultaneously learn across multiple languages, layouts, and tasks. For named entity recognition, the ontology to be applied can be specified via the input prompt. The architecture employs a convolutional encoder capable of processing images of any size without resizing, paired with an autoregressive decoder based on a transformer-based language model. DANIEL achieves competitive results on four datasets, including a new state-of-the-art performance on RIMES 2009 and M-POPP for Handwriting Text Recognition, and IAM NER for Named Entity Recognition. Furthermore, DANIEL is much faster than existing approaches. We provide the source code and the weights of the trained models at \url{https://github.com/Shulk97/daniel}.

Related Organizations

UNIVERSITE DE ROUEN NORMANDIE
France
University of Rouen
France

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	6
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

6

Top 10%

Green

Funded by

ANR| HAISCoDe, ANR| EXO-POPP

Related to Research communities

Knowmad Institut