Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2021
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2021
License: CC BY
Data sources: Datacite
versions View all 4 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

DrugProt corpus: Biocreative VII Track 1 - Text mining drug and chemical-protein interactions

Authors: Krallinger, Martin; Rabal, Obdulia; Miranda-Escalada, Antonio; Valencia, Alfonso;

DrugProt corpus: Biocreative VII Track 1 - Text mining drug and chemical-protein interactions

Abstract

Newer version (1.1) contains the training and the development sets: https://zenodo.org/record/5042151 Gold Standard annotations of the DrugProt corpus (training set) Introduction The aim of the DrugProt track (similar to the previous CHEMPROT task of BioCreative VI) is to promote the development and evaluation of systems that are able to automatically detect in relations between chemical compounds/drug and genes/proteins. We have therefore generated a manually annotated corpus, the DrugProt corpus, where domain experts have exhaustively labeled:(a) all chemical and gene mentions, and (b) all binary relationships between them corresponding to a specific set of biologically relevant relation types (DrugProt relation classes). There is also an increasing interested in the integration of chemical and biomedical data understood as curation of relationships between biological and chemical entities from text and storing such information in form of structured annotation databases. Such databases are of key relevance not only for biological but also for pharmacological and clinical research. A range of different types chemical-protein/gene interactions are of key relevance for biology, including metabolic relations (e.g. substrates, products) inhibition, binding or induction associations. The DrugProt track aims to address these needs and to promote the development of systems able to extract chemical-protein interactions that might be of relevance for precision medicine as well as for drug discovery and basic biomedical research. The DrugProt track in BioCreative VII (BC VII) will explore recognition of chemical-protein entity relations from abstracts. Teams participating in this track are provided with: PubMed abstracts Manually annotated chemical compound mentions Manually annotated gene/protein mentions Manually annotated chemical compound-protein relations Zip structure: Training set folder with drugprot_training_abstracts.tsv: PubMed records drugprot_training_entities.tsv: manually labeled mention annotations of chemical compounds and genes/proteins drugprot_training_relations.tsv: chemical-­protein relation annotations Data format description The input files for the DrugProt track will be plain-text, UTF8-encoded PubMed records in a tab-separated format with the following three columns: Article identifier (PMID, PubMed identifier) Title of the article Abstract of the article DrugProt entity mention annotation files do contain manually labeled mention annotations of chemical compounds and genes/proteins (so-called gene and protein-related objects – GPRO as defined during BioCreative V). Such files consist of tab-separated fields containing the following three columns: 1Article identifier (PMID) Entity or term number (for this record) Type of entity mention (CHEMICAL, GENE-Y, GENE-N) Start character offset of the entity mention End character offset of the entity mention Text string of the entity mention Example DrugProt entity mention annotations: 11808879 T12 GENE-Y 1860 1866 KIR6.2 11808879 T13 GENE-N 1993 2016 glutamate dehydrogenase 11808879 T14 GENE-Y 2242 2253 glucokinase 23017395 T1 CHEMICAL 216 223 HMG-CoA 23017395 T2 CHEMICAL 258 261 EPA DrugProt relation annotations will be distributed as a file that contains the detailed chemical-protein relation annotations prepared for the DrugProt track. It consists of tab-separated columns containing: Article identifier (PMID) DrugProt relation Interactor argument 1 Interactor argument 2 Example DrugProt relation annotations: 12488248 INHIBITOR Arg1:T1 Arg2:T52 12488248 INHIBITOR Arg1:T2 Arg2:T52 23220562 ACTIVATOR Arg1:T12 Arg2:T42 23220562 ACTIVATOR Arg1:T12 Arg2:T43 23220562 INDIRECT-DOWNREGULATOR Arg1:T1 Arg2:T14 Please, cite: @inproceedings{krallinger2017overview, title={Overview of the BioCreative VI chemical-protein interaction Track}, author={Krallinger, Martin and Rabal, Obdulia and Akhondi, Saber A and P{\'e}rez, Mart{\i}n P{\'e}rez and Santamar{\'\i}a, Jes{\'u}s and Rodr{\'\i}guez, Gael P{\'e}rez and others}, booktitle={Proceedings of the sixth BioCreative challenge evaluation workshop}, volume={1}, pages={141--146}, year={2017}} Summary statistics: Training set Documents 3500 Tokens 1001168 Annotated Entities 89529 Annotated Relations 17288 Annotated Entities: Annotated Entities CHEMICAL 46274 GENE-Y [Normalizable] 28421 GENE-N [Non-Normalizable] 14834 Gene Total (N+Y) 43255 Total 89529 Annotated Relations: Annotated Relations INDIRECT-DOWNREGULATOR 1330 INDIRECT-UPREGULATOR 1379 DIRECT-REGULATOR 2250 ACTIVATOR 1429 INHIBITOR 5392 AGONIST 659 AGONIST-ACTIVATOR 29 AGONIST-INHIBITOR 13 ANTAGONIST 972 PRODUCT-OF 921 SUBSTRATE 2003 SUBSTRATE_PRODUCT-OF 25 PART-OF 886 Total 17288 For further information, please visit https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-1/ or email us at krallinger.martin@gmail.com and antoniomiresc@gmail.com Related resources: Web Relation annotation guidelines Gene and protein annotation guidelines Chemicals and drugs annotation guidelines

DrugProt corpus is promoted by the Plan de Impulso de las Tecnologías del Lenguaje de la Agenda Digital (Plan TL).

Related Organizations
Keywords

biocreative, relation extraction, NER, biomedical NLP, NLP

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 264
    download downloads 139
  • 264
    views
    139
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
264
139
Related to Research communities