
doi: 10.1109/das.2016.74
Written texts are both abstract and physical objects: ideas, signs and shapes, whose meanings, graphical systems and social connotations evolve through time. To study this dual nature of texts, paleographers need to analyse large scale corpora at the finest granularity, such as character shape. This goal can only be reached through an automatic segmentation process. In this paper, we present a method, based on Handwritten Text Recognition, to automatically align images of digitized manuscripts with texts from scholarly editions, at the levels of page, column, line, word, and character. It has been successfullyapplied to two datasets of medieval manuscripts, which are now almost fully segmented at character level. The quality of the word and character segmentations are evaluated and further paleographical analysis are presented.
Text recognition, Image segmentation, Paleography, Character recognition, Shape, [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Error analysis, [SHS.HIST] Humanities and Social Sciences/History, word and character segmentation, automatic text recognition, Training, Hidden Markov models
Text recognition, Image segmentation, Paleography, Character recognition, Shape, [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Error analysis, [SHS.HIST] Humanities and Social Sciences/History, word and character segmentation, automatic text recognition, Training, Hidden Markov models
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
