Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY
Data sources: ZENODO
addClaim

Phenebank: Processed Medline Abstracts

Authors: Nigel Collier; Mohammad Taher Pilehvar; Adam S. Bernard; Damian Smedley;

Phenebank: Processed Medline Abstracts

Abstract

The PheneBank project: Free text scientific literature has the potential to be an incredibly valuable source of data for uncovering the often hidden relationships between genes, diseases and phenotypes. Phenotypic descriptions cover abnormalities in anatomical structures, processes and behaviours. For example 'growth delay' and 'body weight loss'. Such descriptions form the basis for determining the existence and treatment of a disease but, because of their inherent complexity, have previously received less attention by the text mining community. In recent years, significant effort has been spent by a small number of expert curators to create coding systems for phenotypes (called "ontologies"), such as the Human Phenotype Ontology (HP) and the Mammalian Phenotype Ontology (MP). The PheneBank project proposes to support and speed up curation using terms discovered directly from the literature and to automatically integrate them with such standard ontologies. The project seeks to harness texts for extracting statistically significant associations between phenotypes, diseases and genes. Earlier approaches have suffered from not providing deep semantic representations of the phenotypes they tried to target. Our deep learning-based approach is an attempt to overcome this issue by reducing the uncertainty between textual and ontological forms of phenotypes. Specifically, the model treats multitoken named entities as a single token which allows more reliable handling of multiword expressions. The approach builds on ground breaking research at the European Bininformatics Institute by the PI (Nigel Collier) and the Co-investigator (Damian Smedley, Queen Mary University London), including terminology alignment of phenotypes using pairwise scoring of the conceptual elements that make up the phenotype. https://sites.google.com/site/nhcollier/projects/phenebank The dataset: As an output of the PheneBank project, we release the set of 24 million MEDLINE abstracts annotated with 9 classes of entity: Phenotype, Disease, Anatomy, Cell, Cell_line, GPR, Gene_variant, Molecule, and Pathway. The entities have been mapped to five major ontologies: SNOMED, HPO, MeSH, PRO, and FMA. Processing: The NER tagging has been done using a BiLSTM-CRF neural model trained on expert-annotated data (to be released for research). The grounding to ontologies relies on semantic embedding of concepts and entities in a unified semantic space. Data format: The zip file contains 24359010 .txt files that are classified into 812 directories. Each .txt file is named with a PubMed article ID and contains the corresponding article's abstract and its annotations. Each line starts with a word; for those words that are identified as entities, entity type and mapping information are followed in the same line (tab separated), with the following format: word <TAB> ::: <TAB> entity_type <TAB> entity_concept_ID_1##confidence_score_1 entity_concept_ID_2##confidence_score_2 ... Note that the concepts are sorted according to their mapping confidence scores.

Related Organizations
Keywords

Phenotype, MEDLINE, PheneBank

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 1
  • 1
    views
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
0
Average
Average
Average
1