Printed Arabic-Script Base Model Trained on the OpenITI Corpus

Printed Arabic-Script Base Model Trained on the OpenITI Corpus ============================================================== This is a text recognition model trained on the OpenITI dataset of printed Arabic-script text available at [0] in its state of 2022-09-03. It encompasses real world Arabic (~23k lines), Persian (~17k lines), Urdu (~11k lines), and Ottoman Turkish (~7100 lines) material in a variety of typefaces augmented by a synthetic data in the Tahoma (600 lines) typeface. As the model is trained on a variety of languages and highly diverse typefaces it is mostly intended as a base model for fine-tuning more specific models from it. In line with this it has not been extensively verified or optimized. The ground truth was lightly normalized to NFD but is otherwise untouched. [0]: https://github.com/OpenITI/arabic_print_data.git

Related Organizations

École Pratique des Hautes Études
France

Keywords

kraken_pytorch

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	38
download	downloads	811

38
views
811
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

38

811