<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

MDIW-13: New Database and Benchmark for Script Identification

Name: MDIW-13: New Database and Benchmark for Script Identification
Keywords: script identification, optical character recognition, Multi-script database, deep learning for script identification, Document analysis, handcrafted features for script identification

Research datakeyboard_double_arrow_right Dataset 10 Mar 2022Publisher:Zenodo

Authors: Ferrer, Miguel A.; Das, Abhijit; Diaz, Moises; Morales, Aythami; Carmona-Duarte, Cristina; Pal, Umapada;

doi: 10.5281/zenodo.6376096 , 10.5281/zenodo.6343658 , 10.5281/zenodo.6343657

MDIW-13: New Database and Benchmark for Script Identification

- Summary
- Subjects
- Metrics

Abstract

Script identification is a necessary step in some applications involving document analysis in a multi-script and multi-language environment. This paper provides a new database for benchmarking script identification algorithms, which contains both printed and handwritten documents collected from a wide variety of scripts, such as Arabic, Bengali (Bangla), Gujarati, Gurmukhi, Devanagari, Japanese, Kannada, Malayalam, Oriya, Roman, Tamil, Telugu, and Thai. The dataset consists of 1,135 documents scanned from local newspapers and handwritten letters and notes from different native writers. Further, these documents are segmented into lines and words, comprising a total of 13,979 and 86,655 lines and words, respectively, in the dataset. Easy-to-go benchmarks are proposed with handcrafted and deep learning methods. The benchmark includes results at the document, line, and word levels with printed and handwritten documents. Results of script identification independent of the document/line/word level and independent of the printed/handwritten letters are also given. https://www.dropbox.com/s/vtmy0l4gjxun0oe/Multiscript_SIW_Database_Feb25_acceptedPaper.zip?dl=0 Please, cite our work if you find useful the database: M. A. Ferrer, A. Das, M. Diaz, A. Morales, C. Carmona-Duarte, U. Pal (2022), "MDIW-13: New Database and Benchmark for Script Identification", Multimedia Tools and Applications, Pages 1-14. Accepted A. Das, M. A. Ferrer, A. Morales, M. Diaz, U. Pal, et al. "SIW 2021: ICDAR Competition on Script Identification in the Wild". 16th International Conference on Document Analysis and Recognition (ICDAR 2021). Lecture Notes in Computer Science, vol 12824. Springer. Sep. 5-10, 2021, Lausanne, Switzerland, pp. 738-753. doi: 10.1007/978-3-030-86337-1_49

Related Organizations

University of California System
United States
Griffith University
Australia
Autonomous University of Madrid
Spain
University of Las Palmas de Gran Canaria
Spain
Indian Statistical Institute
India

Keywords

script identification, optical character recognition, Multi-script database, deep learning for script identification, Document analysis, handcrafted features for script identification

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	27
download	downloads	17

27
views
17
downloads
Powered by

Found an issue? Give us feedback

visibility

download

Average