
Abstract Pharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes knowledge related to PGx a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly usable to humans or software. Natural language processing techniques have been developed and are indeed employed for guiding experts curating this amount of knowledge. But, existing works are limited by the absence of high quality annotated corpora focusing on the domain. This absence restricts in particular the use of supervised machine learning approaches. This article introduces PGxCorpus, a manually annotated corpus, designed for the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly genes variations, gene, drugs and phenotypes), and relationships between those. We present in this article the method used to annotate consistently texts, and a baseline experiment that illustrates how this resource may be leveraged to synthesize and summarize PGx knowledge.
Statistics and Probability, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Data Descriptor, PubMed, [SDV.SP.MED] Life Sciences [q-bio]/Pharmaceutical sciences/Medication, [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing, Library and Information Sciences, [SDV.GEN.GH] Life Sciences [q-bio]/Genetics/Human genetics, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Education, [SDV.SP.MED]Life Sciences [q-bio]/Pharmaceutical sciences/Medication, Humans, Data Curation, [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM], Computer Science Applications, [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, [SDV.GEN.GH]Life Sciences [q-bio]/Genetics/Human genetics, Pharmacogenetics, [SDV.SP.PHARMA] Life Sciences [q-bio]/Pharmaceutical sciences/Pharmacology, [SDV.SP.PHARMA]Life Sciences [q-bio]/Pharmaceutical sciences/Pharmacology, Supervised Machine Learning, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], Statistics, Probability and Uncertainty, Information Systems
Statistics and Probability, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Data Descriptor, PubMed, [SDV.SP.MED] Life Sciences [q-bio]/Pharmaceutical sciences/Medication, [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing, Library and Information Sciences, [SDV.GEN.GH] Life Sciences [q-bio]/Genetics/Human genetics, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Education, [SDV.SP.MED]Life Sciences [q-bio]/Pharmaceutical sciences/Medication, Humans, Data Curation, [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM], Computer Science Applications, [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, [SDV.GEN.GH]Life Sciences [q-bio]/Genetics/Human genetics, Pharmacogenetics, [SDV.SP.PHARMA] Life Sciences [q-bio]/Pharmaceutical sciences/Pharmacology, [SDV.SP.PHARMA]Life Sciences [q-bio]/Pharmaceutical sciences/Pharmacology, Supervised Machine Learning, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], Statistics, Probability and Uncertainty, Information Systems
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 15 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
