
The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.
Electronic Data Processing, Handwriting, Models, Statistical, Information Storage and Retrieval, Reproducibility of Results, Classification and Association Rules, Numerical Analysis, Computer-Assisted, Signal Processing, Computer-Assisted, Documentation, Image Enhancement, Sensitivity and Specificity, Clustering, Pattern Recognition, Automated, Document Analysis, Reading, Artificial Intelligence, Subtraction Technique, Image Interpretation, Computer-Assisted, Information systems, Script Identification, Texture, Wavelets and Fractals, Algorithms
Electronic Data Processing, Handwriting, Models, Statistical, Information Storage and Retrieval, Reproducibility of Results, Classification and Association Rules, Numerical Analysis, Computer-Assisted, Signal Processing, Computer-Assisted, Documentation, Image Enhancement, Sensitivity and Specificity, Clustering, Pattern Recognition, Automated, Document Analysis, Reading, Artificial Intelligence, Subtraction Technique, Image Interpretation, Computer-Assisted, Information systems, Script Identification, Texture, Wavelets and Fractals, Algorithms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 128 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
