Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

ACCELERATING PATHOLOGY REPORT DIGITIZATION: A MULTI-ENGINE OCR AND LLM FRAMEWORK FOR HEALTHCARE APPLICATIONS

Authors: Journal of Theoretical and Applied Information Technology;

ACCELERATING PATHOLOGY REPORT DIGITIZATION: A MULTI-ENGINE OCR AND LLM FRAMEWORK FOR HEALTHCARE APPLICATIONS

Abstract

Digitization and structuring of pathology reports are essential in modern healthcare for enhancing patient care, data analytics, and medical research. This study presents a framework called Dual-integrated Text Extraction using Hybrid OCR Engines (DiText-OCR), which leverages multiple OCR tools and domain-specific dictionaries to accurately digitize diverse text types, including printed text and low-quality scans. The extracted text is further processed using Large Language Models (LLMs) for named entity recognition, relationship extraction, and data structuring. The resulting structured data are integrated into healthcare databases and systems, enabling applications in clinical decision support, research, and analytics while ensuring interoperability. Despite its effectiveness, the framework faces challenges, such as handling non-standard report formats, maintaining patient privacy, and addressing the current limitations of OCR and LLM technologies in medical contexts. Future research aims to integrate this system with electronic health records, extend its application to other medical documents, and utilize structured data for advanced research and predictive analytics. By addressing these challenges, the proposed framework has the potential to revolutionize medical data management, ultimately improving patient outcomes, enhancing clinical efficiency, and fostering innovation in healthcare.

Keywords

Pathology report digitization, DiText-OCR framework, Optical Character Recognition (OCR), Large Language Models (LLMs), healthcare data interoperability, clinical decision support.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green