Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Project deliverable . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Project deliverable . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2021
License: CC BY
Data sources: ZENODO
versions View all 2 versions
addClaim

D3.1 Semantic resources

Authors: Andres Garcia-Silva; Jose Manuel Gomez-Perez; Raul Ortega; Esteban González Guardia;

D3.1 Semantic resources

Abstract

This deliverable describes the extension and customization of the text mining and enrichment services that will be integrated in EOSC during RELIANCE. Currently available through ROHub.org, such services provide a variety of scientific communities with text analytics functionalities, contributing to make research data, materials, and results machine-readable and easier to discover by scientists and machines alike1. Despite being already in operation, these text mining and enrichment services need to be tailored to the specific vocabulary used by the RELIANCE research communities and therefore extended and customized to successfully deliver domain-specific information to such communities. We start by carrying out a survey where we elicit from the communities the specific fields of research that are relevant for their work, as well as the key journals and venues where they usually communicate their results. Then, we harvest from SciGraph, a knowledge graph of scientific publications released by Springer Nature, a corpus of scientific papers with publications from the last 5 years that belong to such fields of research. The resulting corpus is important for different reasons. First, to enhance the coverage of the scientific terminology supported by the RELIANCE text mining services. Second, to train new language models, either from scratch or by fine-tuning existing pre-trained models, which enable the development of further experimental text mining services based on natural language understanding and machine reading comprehension of scientific documents. Herein, we mainly focus on the former, while the latter will be addressed in forthcoming deliverables. We run a text mining analysis of our corpus with special attention to the entities, phrases, and concepts, as well as the relationships between them, that were not previously covered by our text mining and enrichment services. Such linguistic artifacts, which represent the missing pieces of information necessary to successfully analyze documents of interest for the RELIANCE communities, are integrated by knowledge engineers and linguists in a knowledge graph. This knowledge graph is called Sensigrafo, a lexico-semantic knowledge graph at the core of the RELIANCE text mining services. As a result of this process, the text mining and enrichment services are enabled to understand the domain terminology used by the target scientific communities in RELIANCE. The resulting text corpus and domain-specific terminology have been released and are publicly available through Zenodo2. In this deliverable, we also introduce pre-trained language models and their application in the context of the RELIANCE text mining and enrichment services. Pre-trained language models like BERT were trained on large general-purpose corpora and have proven to be very useful to tackle different natural language understanding challenges by fine-tuning for specific tasks on domain-specific data. Currently, language models represent the state of the art in many tasks in natural language understanding. In RELIANCE, we plan to use them as a complementary resource to further improve performance in text mining tasks like text classification. In addition, we will explore the application of language models to everyday tasks in a researcher’s life that can benefit from machine understanding of natural language, like the comprehension of scientific documents or the analysis of scientific claims. Finally, this deliverable analyzes other resources that we are planning to leverage in RELIANCE, like the OpenAIRE knowledge graph, which interlinks scientific results, including papers, data, and software, across different repositories. The enrichment of such resources through the RELIANCE text mining and enrichment services will increase their findability by the EOSC communities and support the scalable creation of research objects. We also review the ongoing OpenAIRE open citation initiative, which aims at providing a citation-based graph of research work through OpenAIRE with the potential to become a valuable resource for RELIANCE as well.

This is the draft version of the deliverable not yet approved by the European Commission.

Related Organizations
Keywords

OpenAire, RO-Crate, EOSC, Data cubes, Research Objects, ROhub, RELIANCE

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
Funded by
Related to Research communities
OpenAIRE