Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

MiDRASH Automatic Transcriptions of the Cairo Geniza Fragments

Authors: stoekl ben ezra, daniel; Bambaci, Luigi; Kiessling, Benjamin; Lapin, Hayim; Ezer, Nurit; LOLLI, ELENA; Rustow, Marina; +10 Authors

MiDRASH Automatic Transcriptions of the Cairo Geniza Fragments

Abstract

This is the first automatic transcription of the entire collection of digital images of the Geniza at the National Library of Israel as of this date. It was created using kraken version 5.3.1.dev56. To find a fragment put the 99 ID number into KTIV. We are aware that this is a very preliminary and imperfect result, which we are releasing now because of the high value for scholarship even in its current form. We are aware of the following misgivings: Obviously there are segmentation and text recognition mistakes. Some texts have wrong reading order where the left region region precedes the right. Vertical text has mostly been ignored. Many images with 3 or 4 parallel text regions only have the outer ones. Arabic script recognition is less good than Hebrew script. The three steps encompassed a) an image classifier to choose the best layout segmentation and recognition models. https://edizionicafoscari.it//it/edizioni/riviste/magazen/2024/2/netlay-layout-classification-dataset-for-enhancing/#d670e63 b) Region and line segmentation with kraken c) Text recognition with kraken Funded by the European Union (ERC, MiDRASH, Project No. 101071829). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

Keywords

fragmentology, kraken, Judeo-Arabic, Cairo Geniza, manuscripts, Aramaic, Digital Humanities, OCR, ATR, Computational Humanities, Hebrew, HTR, Jewish Studies, layout segmentation, image classification

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average