publication . Conference object . Other literature type . 2019

An Analysis of the Performance of Named Entity Recognition over OCRed Documents

Ahmed Hamdi; Axel Jean-Caurant; Nicolas Sidère; Mickaël Coustaty; Antoine Doucet;
Open Access English
  • Published: 02 Jun 2019
  • Publisher: HAL CCSD
  • Country: France
Abstract
The use of digital libraries requires an easy accessibility to documents which is strongly impacted by the quality of document indexing. Named entities are among the most important information to index digital documents. According to a recent study, 80% of the top 500 queries sent to a digital library portal contained at least one named entity [2]. However most digitized documents are indexed through their OCRed version which includes numerous errors that may hinder the access to them. Named Entity Recognition (NER) is the task that aims to locate important names in a given text and to categorize them into a set of predefined classes (person, location, organizat...
Subjects
ACM Computing Classification System: ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
free text keywords: [INFO]Computer Science [cs], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, [INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL], Indexing,, OCR, Named Entity, Extraction, Digital Libraries, Indexing,, OCR, Named Entity, Extraction, Digital Libraries, Categorization, Digital library, Information retrieval, Named entity, Named-entity recognition, computer.software_genre, computer, Search engine indexing, Computer science
Related Organizations
Funded by
EC| NewsEye
Project
NewsEye
NewsEye: A Digital Investigator for Historical Newspapers
  • Funder: European Commission (EC)
  • Project Code: 770299
  • Funding stream: H2020 | RIA
Validated by funder
Download fromView all 8 versions
Hal-Diderot
Conference object . 2019
Provider: Hal-Diderot
Zenodo
Conference object . 2019
Provider: Datacite
Zenodo
Other literature type . 2019
Provider: Datacite
Any information missing or wrong?Report an Issue