BELB: a biomedical entity linking benchmark

Name: BELB: a biomedical entity linking benchmark
Keywords: ddc:004, FOS: Computer and information sciences, Original Paper, Computer Science - Computation and Language, Data Mining, 570 Biologie, ddc:570, 004 Informatik, Computation and Language (cs.CL), Software

Samuele Garda; Leon Weber-Genzel; Robert Martin; Ulf Leser

Found an issue? Give us feedback

Bioinformaticsarrow_drop_down

Bioinformatics

Article . 2023 . Peer-reviewed

License: CC BY

Data sources: Crossref

Bioinformatics

Article . 2023

Data sources: Europe PubMed Central

PubMed Central

Other literature type . 2023

License: http://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Data sources: PubMed Central

arXiv.org e-Print Archive

Preprint . 2023

Data sources: arXiv.org e-Print Archive

edoc-Server. Open-Access-Publikationsserver der Humboldt-Universität zu Berlin

Article . 2023 . Peer-reviewed

Data sources: edoc-Server. Open-Access-Publikationsserver der Humboldt-Universität zu Berlin

https://dx.doi.org/10.48550/ar...

Article . 2023

License: CC BY SA

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Article

Data sources: DBLP

BELB: a biomedical entity linking benchmark

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Preprint 01 Nov 2023Embargo end date: 01 Jan 2023 Germany English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 39 (eissn: 1367-4811,

Copyright policy )Funded by:DFG | unidentified

Authors: Samuele Garda; Leon Weber-Genzel; Robert Martin; Ulf Leser;

doi: 10.1093/bioinformatics/btad698 , 10.48550/arxiv.2308.11537

pmid: 37975879

pmc: PMC10681865

arXiv: 2308.11537

BELB: a biomedical entity linking benchmark

- Summary
- Subjects
- Related research
  (8)
- Metrics

Abstract

Abstract Motivation Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic. Furthermore, neural systems are tested primarily on instances linked to the broad coverage KB UMLS, leaving their performance to more specialized ones, e.g. genes or variants, understudied. Results We therefore developed BELB, a biomedical entity linking benchmark, providing access in a unified format to 11 corpora linked to 7 KBs and spanning six entity types: gene, disease, chemical, species, cell line, and variant. BELB greatly reduces preprocessing overhead in testing BEL systems on multiple corpora offering a standardized testbed for reproducible experiments. Using BELB, we perform an extensive evaluation of six rule-based entity-specific systems and three recent neural approaches leveraging pre-trained language models. Our results reveal a mixed picture showing that neural approaches fail to perform consistently across entity types, highlighting the need of further studies towards entity-agnostic models. Availability and implementation The source code of BELB is available at: https://github.com/sg-wbi/belb. The code to reproduce our experiments can be found at: https://github.com/sg-wbi/belb-exp.

Country

Germany

Related Organizations

Ludwig-Maximilians-Universität München
Germany
Humboldt-Universität zu Berlin
Germany
Computer Science Department
Germany

Keywords

ddc:004, FOS: Computer and information sciences, Original Paper, Computer Science - Computation and Language, Data Mining, 570 Biologie, ddc:570, 004 Informatik, Computation and Language (cs.CL), Software, Language, Natural Language Processing

8 Research products, page 1 of 1

RapidFuzz software on GitHub
IsRelatedTo
belb software on GitHub
IsRelatedTo
arboEL software on GitHub
IsRelatedTo
MedMentions software on GitHub
IsRelatedTo
belb-exp software on GitHub
IsRelatedTo
s800 software on GitHub
IsRelatedTo
GenBioEL software on GitHub
IsRelatedTo
BioSyn software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

4

Top 10%

Average

Green

gold

Funded by

DFG| unidentified

Related to Research communities

Digital Humanities and Cultural Heritage