
pmid: 18787240
This paper presents a document retrieval technique that is capable of searching document images without OCR (optical character recognition). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.
Electronic Data Processing, Word shape coding, Databases, Factual, Document image analysis, Information Storage and Retrieval, Documentation, Image Enhancement, 004, Pattern Recognition, Automated, Reading, Artificial Intelligence, Image Interpretation, Computer-Assisted, Document image retrieval, Database Management Systems, Language
Electronic Data Processing, Word shape coding, Databases, Factual, Document image analysis, Information Storage and Retrieval, Documentation, Image Enhancement, 004, Pattern Recognition, Automated, Reading, Artificial Intelligence, Image Interpretation, Computer-Assisted, Document image retrieval, Database Management Systems, Language
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 62 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
