
Optical Character Recognition (OCR) is the process of extracting text from an image. The main purpose of an OCR is to make editable documents from existing paper documents or image files. OCR primarily works in two phases; they are character and word detection. In case of more sophisticated approach, an OCR also works on sentence detection to preserve documents’ structures. In this paper, we would discuss the process of developing an OCR for Bengali language. Lots of efforts have been put on developing an OCR for Bengali. Though some OCRs have been developed, none of them is completely error free. For our thesis, we trained Tesseract OCR Engine to develop an OCR for Bengali language. Tesseract is currently the most accurate OCR engine. This engine was developed at HP labs and currently sponsored by Google. In Tesseract there are two option to training first one is Legacy Training and second is LSTM Training. We do both of them.
bepress|Engineering|Computer Engineering|Other Computer Engineering, Other Computer Engineering, Engineering, engrXiv|Engineering, bepress|Engineering, engrXiv|Engineering|Computer Engineering, engrXiv|Engineering|Computer Engineering|Other Computer Engineering, Computer Engineering, bepress|Engineering|Computer Engineering
bepress|Engineering|Computer Engineering|Other Computer Engineering, Other Computer Engineering, Engineering, engrXiv|Engineering, bepress|Engineering, engrXiv|Engineering|Computer Engineering, engrXiv|Engineering|Computer Engineering|Other Computer Engineering, Computer Engineering, bepress|Engineering|Computer Engineering
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
