Yarmouk Arabic OCR Dataset

Iyad Abu Doush; Faisal AIKhateeb; Anwaar Hamdi Gharibeh

Found an issue? Give us feedback

https://doi.org/10.1...arrow_drop_down

https://doi.org/10.1109/csit.2...

Article . 2018 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.1109/csi...

Article

Data sources: Microsoft Academic Graph

Yarmouk Arabic OCR Dataset

descriptionPublicationkeyboard_double_arrow_right Article 01 Jul 2018Publisher:IEEEJournal:2018 8th International Conference on Computer Science and Information Technology (CSIT)

Authors: Iyad Abu Doush; Faisal AIKhateeb; Anwaar Hamdi Gharibeh;

doi: 10.1109/csit.2018.8486162

Yarmouk Arabic OCR Dataset

- Summary
- Metrics

Abstract

Optical Character Recognition (OCR) is the process of recognizing characters automatically from scanned or image documents. OCR software uses machine learning to recognize characters in the document. Such software needs to pass a training phase to learn how to recognize the letters in the text. In order to implement the training phase the OCR needs to use a standard dataset. The dataset can be used to evaluate the obtained results. In this research, we propose an Arabic printed OCR dataset. To the best of our knowledge, there is no Arabic OCR dataset that is available to be used by the research community with its ground truth with a size that is suitable to build a robust Arabic OCR. The proposed dataset is extracted randomly from Wikipedia to have different topics. It consists of 4,587 Arabic articles with a total of 8,994 images.

Related Organizations

Yarmouk University
Jordan
American University of Kuwait
Kuwait

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	11
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

11

Top 10%

Average

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now