Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ edoc-Server. Open-Ac...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://dx.doi.org/10.18452/30...
Doctoral thesis . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Representation Learning for Biomedical Text Mining

Authors: Sänger, Mario;

Representation Learning for Biomedical Text Mining

Abstract

With the rapid growth of biomedical literature, obtaining comprehensive information regarding particular biomedical entities and relations by only reading is becoming increasingly difficult. Text mining approaches seek to facilitate processing these vast amounts of text using machine learning. This renders effective and efficient encoding of all relevant information regarding specific entities as one central challenge in these approaches. In this thesis, we contribute to this research by developing machine learning methods for learning entity and text representations based on large-scale publication repositories and diverse information from in-domain knowledge bases. First, we propose two novel relation extraction approaches that use representation learning techniques to create comprehensive models of entities or entity pairs. These models learn low-dimensional embeddings by considering all publications from PubMed mentioning a specific entity or pair of entities. We use these embeddings as input for a neural network to classify relations globally, i.e., predictions are based on the entire corpus, not on single sentences or articles as in prior art. In our second contribution, we investigate the impact of multi-modal entity information for biomedical link prediction using knowledge graph embedding methods (KGEM). Our study enhances existing KGEMs by augmenting biomedical knowledge graphs with multi-modal entity information from in-domain databases. We propose a general framework for integrating this information into the KGEM entity representation learning process. In our third contribution, we augment pre-trained language models (PLM) with additional context information to identify interactions described in scientific texts. We perform an extensive benchmark that assesses the performance of such models across a wide range of biomedical relation scenarios, providing a comprehensive, but so far missing, evaluation of knowledge-augmented PLM-based extraction models.

Die Untersuchung von Beziehungen zwischen biomedizinischen Entitäten bildet einen Eckpfeiler der modernen Medizin. Angesichts der rasanten Zunahme der Forschungsliteratur wird es jedoch zunehmend schwieriger, durch bloßes Lesen umfassende Informationen über bestimmte Entitäten und deren Beziehungen zu gewinnen. Text-Mining Ansätze versuchen, die Verarbeitung dieser riesigen Datenmengen mit Hilfe von Maschinellen Lernen zu erleichtern. Wir tragen zu dieser Forschung bei indem wir Methoden zum Erlernen von Entitäts- und Textrepräsentationen auf Basis großer Publikations- und Wissensdatenbanken entwickeln. Als erstes schlagen wir zwei neuartige Ansätze zur Relationsextraktion vor, die Techniken des Representation Learnings nutzen, um umfassende Modelle biomedizinischer Entitäten und Entitätspaaren zu lernen. Diese Modelle lernen Vektorrepräsentationen, indem sie alle PubMed-Artikel berücksichtigen, die eine bestimmte Entität oder ein Entitätspaar erwähnen. Wir verwenden diese Vektoren als Eingabe für ein neuronales Netzwerk, um Relationen global zu klassifizieren, d. h. die Vorhersagen basieren auf dem gesamten Korpus und nicht auf einzelnen Sätzen oder Artikeln wie in konventionellen Ansätzen. In unserem zweiten Beitrag untersuchen wir die Auswirkungen multimodaler Entitätsinformationen auf die Vorhersage von Relationen mithilfe von Knowledge Graph Embedding Methoden. In unserer Studie erweitern wir bestehende Modelle, indem wir Wissensgraphen mit multimodalen Informationen anreichern. Ferner schlagen wir ein allgemeines Framework für die Integration dieser Informationen in den Lernprozess für Entitätsrepräsentationen vor. In unserem dritten Beitrag erweitern wir Sprachmodelle mit zusätzlichen Entitätsinformationen für die Identifikation von Relationen in Texten. Wir führen eine umfangreiche Evaluation durch, welche die Leistung solcher Modelle in mehreren Szenarien erfasst und damit eine umfassende, jedoch bisher fehlende, Bewertung solcher Modelle liefert.

Country
Germany
Related Organizations
Keywords

ddc:004, Vortrainierte Sprachmodelle, Multi-modal Entity Information, Biomedical Natural Language Processing, Representation Learning, 570 Biologie, Pre-trained Language Models, WC 7700, Knowledge Augmentation, ST 306, Benchmark, 004 Informatik, Relation Extraction, Machine Learning, Relationsextraction, Multimodale Entitätsinformationen, Biomedizinisches Text Mining, Multi-modal Knowledge Graphs, ddc:570, Biomedical Text Mining

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green