Structural Analysis and Understanding of Graphical Layouts in Documents

In computer vision applied to document understanding, layout is a fundamental component. Many document categories usually follow a (pseudo) structural formalism that described a document as a valid instance of a given syntax. Examples exist in different categories: invoices, forms, graphical diagrams, scientific articles and so on. In all these cases, the constituent terms are not only recognized individually as a result of an OCR engine, but, especially when the image interpretation conveys semantic labeling (Named Entity Recognition), the geometric context where they appear plays a fundamental role. Layout information in the form of graphical elements (tables, figures, paragraphs etc.) play a vital role in conveying rich and valuable information contained in a document. In this work, we present novel end-to-end deep learning based object detection frameworks using different public benchmark datasets to localize and structurally understand complex graphical layouts in document images. We also try to investigate the concept of transfer learning and domain adaptation to handle the scarcity of labeled training data for the object detection task in document images. Performance analysis and extensive experiments has been carried out on the benchmark datasets like PubLayNet, ICDAR-POD 2017 and ICDAR-RDCL 2019 to study the impact of these concepts and derive significant insight. Finally, we have proposed an automated generative model using Graph Neural Networks(GNNs) to generate synthetic data that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is interesting to note that our synthetic graph generation model also becomes the first baseline approach experimented on administrative document images, in this case, invoices. Additionally, a novel dataset derived from RVL-CDIP invoice data has been also contributed to the community.

This is the master thesis dissertation of Sanket Biswas, who graduated in the Master of Computer Vision (MCV) course for the session 2019-20. His dissertation was awarded an excellent grade by the MCV thesis committee members consisting of Jordi Gonzalez Sabate (UAB) , Veronica Vilaplana (UPC) and Jorge Bernal (UAB). His MCV thesis was supervised by Josep Lladós, Director of the Computer Vision Center (CVC) and Associate Professor at the Universitat Autonoma de Barcelona (UAB).

Related Organizations

Autonomous University of Barcelona
Spain

Keywords

synthetic data generation, document layout generation, graphical layout understanding, domain adaptation, structural pattern recognition, document layout analysis, transfer learning, document object detection

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	3
download	downloads	3

3
views
3
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

3

Green