Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2019
License: CC BY
Data sources: ZENODO
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

OCR models for Occitan (standard spelling)

Authors: Marianne Vergez-Couret;

OCR models for Occitan (standard spelling)

Abstract

This dataset provides trained Tesseract (https://github.com/tesseract-ocr/tesseract) and Jochre (https://github.com/urieli/jochre) OCR models for Occitan ( for the standard spelling and two dialects, Gascon and Lengadocian). These models were developed in the context of the RESTAURE project, funded by the French ANR. Two models are provided. They were presented in the following article https://hal.archives-ouvertes.fr/hal-01252241 and also re-evaluated for the creation of another corpus in https://www.openscience.fr/Constitution-et-annotation-d-un-corpus-ecrit-de-contes-et-recits-en-occitan. The first model for Jochre, JOCHRE_2015, has been trained for Jochre 1.1.2b. The training images and corresponding texts were manually annotated using a Jochre online platform (excerpts from 7 different printed works, totalling about 20,000 words) The second model for Tesseract, TESS_2015, was trained using the jTessBoxEditor tool (http://vietocr.sourceforge.net/training.html), Version 1.4 (2 May 2015), based on images automatically generated from the training texts (the one used for Jochre). The generation of the images used a 36pt font size, and two fonts were used (Arial and Times New Roman), with their normal and italic variants. The Tesseract model can be used with Tesseract 3.0x. List of words was also used for those two trainings. We conflated Occitan words found in several lexicons, dictionaries and corpora for the two dialects, Gascon and Lengadocian: Lexicon extracted from 60 literary works (from 29 different authors) gathered in the BaTelÒc project. Dictonary entries from Dictionnaire Français/Occitan Gascon Toulousain de Nicolau Rei Bèthvéder, 2004, IEO Edicions Dictonary entries from Dictionnaire Français/Occitan de Cristian Laus, 2004, IEO/IDECO Dictonary entries from Dictionnaire Français/Occitan (Gascon) de Miquèu Grosclaude, Gilabèrt Nariòo e Patric Guilhemjoan, 2007, Per Noste Edicions Conjugated forms from Verb’Òc (designed by the Congrès permanent de la lenga occitana (http://www.locongres.org)) List of proper nouns extracted from the Apertium (free/open-source machine translation platform) Occitan lexicon. The jochre model can be used with the Jochre software (https://github.com/urieli/jochre). See also Jochre wiki (https://github.com/urieli/jochre/wiki). The Tesseract models can be used for instance using the gImageReader tool (https://github.com/manisandro/gImageReader), which provides a graphical user interface for the Tesseract tool. When evaluated against the same test corpus (four extracts from four different authors from two dialects, Gascon and Lengadocian), the Jochre model achieves better performance levels.

Related Organizations
Keywords

OCR module, Tesseract, Jochre, Occitan, OCR

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 14
    download downloads 1
  • 14
    views
    1
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
14
1