Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ INRIA2arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
INRIA2
Conference object . 2024
Data sources: INRIA2
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
HAL-Rennes 1
Conference object . 2024
Data sources: HAL-Rennes 1
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
https://doi.org/10.1109/is6175...
Article . 2024 . Peer-reviewed
License: STM Policy #29
Data sources: Crossref
versions View all 4 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Phenotypes Extraction from Text: Analysis and Perspective in the LLM Era

Authors: Baddour, Moussa; Paquelet, Stéphane; Rollier, Paul; Tayrac, Marie; Dameron, Olivier; Labbé, Thomas;

Phenotypes Extraction from Text: Analysis and Perspective in the LLM Era

Abstract

Collecting the relevant list of patient phenotypes,known as deep phenotyping, can significantly improve the finaldiagnosis. As textual clinical reports are the richest source ofphenotypes information, their automatic extraction is a criticaltask. The main challenges of this Information Extraction (IE) taskare to identify precisely the text spans related to a phenotype andto link them unequivocally to referenced entities from a sourcesuch as the Human Phenotype Ontology (HPO).Recently, Language Models (LMs) have been the most suc-cessful approach for extracting phenotypes from clinical reports.Solutions such as PhenoBERT, relying on BERT or GPT, haveshown promising results when applied to datasets built on thehypothesis that most phenotypes are explicitly mentioned in thetext. However, this assumption is not always true in medicalgenetics. Hence, although the LMs carry powerful semanticabilities, their contributions are not clear compared to syntacticstring-matching steps that are used within the current pipelines.The goal of this study is to improve phenotype extraction fromclinical notes related to genetic diseases. Our contributions arethreefold: First, we provide a clear definition of the phenotypeextraction task from free text, along with a high-level overview ofthe involved functions. Second, we conduct an in-depth analysisof PhenoBERT, one of the best existing solutions, to evaluate theproportion of phenotypes predicted with simple string-matching.Third, we demonstrate how utilizing and incorporating largelanguage models (LLMs) for span detection step can improveperformance especially with implicit phenotypes. In addition, thisexperiment revealed that the annotations of existing dataset arenot exhaustive, and that LLM can identify relevant spans missedby human labelers.

Keywords

LLM, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], phenotype, [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing, phenoBERT, [INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR], [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], genetic, entity linking, embeddings

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
Related to Research communities