Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Thesis . 2020
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2020
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Thesis . 2020
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Structural Analysis and Understanding of Graphical Layouts in Documents

Authors: BIswas, Sanket;

Structural Analysis and Understanding of Graphical Layouts in Documents

Abstract

In computer vision applied to document understanding, layout is a fundamental component. Many document categories usually follow a (pseudo) structural formalism that described a document as a valid instance of a given syntax. Examples exist in different categories: invoices, forms, graphical diagrams, scientific articles and so on. In all these cases, the constituent terms are not only recognized individually as a result of an OCR engine, but, especially when the image interpretation conveys semantic labeling (Named Entity Recognition), the geometric context where they appear plays a fundamental role. Layout information in the form of graphical elements (tables, figures, paragraphs etc.) play a vital role in conveying rich and valuable information contained in a document. In this work, we present novel end-to-end deep learning based object detection frameworks using different public benchmark datasets to localize and structurally understand complex graphical layouts in document images. We also try to investigate the concept of transfer learning and domain adaptation to handle the scarcity of labeled training data for the object detection task in document images. Performance analysis and extensive experiments has been carried out on the benchmark datasets like PubLayNet, ICDAR-POD 2017 and ICDAR-RDCL 2019 to study the impact of these concepts and derive significant insight. Finally, we have proposed an automated generative model using Graph Neural Networks(GNNs) to generate synthetic data that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is interesting to note that our synthetic graph generation model also becomes the first baseline approach experimented on administrative document images, in this case, invoices. Additionally, a novel dataset derived from RVL-CDIP invoice data has been also contributed to the community.

This is the master thesis dissertation of Sanket Biswas, who graduated in the Master of Computer Vision (MCV) course for the session 2019-20. His dissertation was awarded an excellent grade by the MCV thesis committee members consisting of Jordi Gonzalez Sabate (UAB) , Veronica Vilaplana (UPC) and Jorge Bernal (UAB). His MCV thesis was supervised by Josep Lladós, Director of the Computer Vision Center (CVC) and Associate Professor at the Universitat Autonoma de Barcelona (UAB).

Related Organizations
Keywords

synthetic data generation, document layout generation, graphical layout understanding, domain adaptation, structural pattern recognition, document layout analysis, transfer learning, document object detection

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 3
    download downloads 3
  • 3
    views
    3
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
3
3
Green