Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
versions View all 4 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

FATURA Dataset

Authors: Limam, Mahmoud; Dhiaf, Marwa; Kessentini, Yousri;
Abstract

The dataset consists of 10000 jpg images and 3x10000 json annotation files. The images are generated from 50 different templates. For each template, 200 images were generated. We provide annotations in three formats: our own original format, the COCO format and a format compatible with HuggingFace Transformers. In terms of objects, the dataset contains 24 different classes. The classes vary considerably in their numbers of occurrences and thus, the dataset is somewhat imbalanced. The annotations contain bounding box coordinates, bounding box text and object classes. We propose two methods for training and evaluating models. The models were trained until convergence ie until the model reaches optimal performance on the validation split and started overfitting. The model version used for evaluation is the one with the best validation performance. First Evaluation strategy: For each template, the generated images are randomly split into 3 subsets: training, validation and testing. In this scenario, the model trains on all templates and is thus tested on new images rather than new layouts. Second Evaluation strategy: The real templates are randomly split into a training set, and a common set of templates for validation and testing. All the variants created from the training templates are used as training dataset. The same is done to form the validation and testing datasets. The validation and testing sets are made up of the same templates but of different images. This approach tests the models' performance on different unseen templates/layouts, rather than the same templates with different content. We provide the data splits we used for every evaluation scenario. We also provide the background colors we used as augmentation for each template.

This dataset was developed in the Digital Research Center of Sfax.

Related Organizations
Keywords

Layout analysis, Information Extraction, Document Understanding, Invoice Data

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 27
    download downloads 7
  • 27
    views
    7
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
27
7