Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Presentation . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2022
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Presentation . 2022
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

IDCC22 slides: Improving our measurement of data reuse: OpenAIRE and Springer Nature's collaboration to automate data link identification and classification

Authors: Smith, Graham; Papageorgiou, Haris; Stavropoulos, Petros; Matthews, Tristan;

IDCC22 slides: Improving our measurement of data reuse: OpenAIRE and Springer Nature's collaboration to automate data link identification and classification

Abstract

Lightning talk presentation for the 17th International Digital Cuation Conference (IDCC22) on the topic of Reusability. These slides present the methods and some preliminary results from Springer Nature and OpenAIRE's collaboration to improve data linking, specifically a workstream enabling better measurement of data reuse from existing publications. Improving our measurement of data reuse: OpenAIRE and Springer Nature's collaboration to automate data link identification and classification Abstract: Publishers, funders, governments and research institutions increasingly encourage researchers to make their data reusable. Making the case for FAIR data presents a challenge in how to track and measure data reuse. Springer Nature and OpenAIRE are collaborating on a text and data mining approach to improve our knowledge of data reuse. This lightning talk presents our research aims, methods and some preliminary results. Since the publication of the FAIR principles in 2016, organisational efforts to make data FAIR have mainly focused on requirements and motivations for the dataset creator; data policies widely adopted by publishers and funders emphasise what authors should do with their data, supported by editorial guidance and checks. Technical solutions have built on the increased usage of repositories, metadata standards and the growing role of data curators. While there are clearly-quantified problems FAIR data can solve (e.g. the €10.2bn cost to the European economy of non-FAIR data), researchers may ask "will the time and effort spent making my data FAIR actually lead to its reuse?". Effective implementation of FAIR should lead to data being reused, not just theoretically reusable. This necessitates investigation of actual data reuse patterns, which presents certain challenges. The information required to link dataset creation and reuse, or creator and reuser, is often incomplete or absent. Even where technical frameworks such as Scholix have enabled progress, the required culture change for data accreditation is insufficient to provide a comprehensive picture. A number of recent initiatives have sought to address the data reuse knowledge gap using machine learning techniques on published literature. The Coleridge Institute’s ‘Show US the Data’ program and the NIH LitCoin Natural Language Processing (NLP) challenge are two such competitions to use NLP to identify public datasets from research publications. Springer Nature and OpenAIRE are partnering to address this issue by improving the discoverability of links between Springer Nature’s publications and underlying research data. This collaboration employs OpenAIRE’s text and data mining algorithms to detect and classify data-article links. It has the advantage of being able to analyse the existing corpus, and interrogate data reuse: based on authorship (reuse by the same / different authors) based on discipline (reuse within / between disciplines) throughout time across repositories OpenAIRE identifies dataset from the literature using data identifiers and surrounding context from the research publication. Advanced artificial intelligence and NLP techniques process the corpus, aiming to capture all instances of underlying data in the manuscript. Authors’ data citation behaviour is then disambiguated in terms of reuse and attribution to creators based on the surrounding context. The process constructs a scientific knowledge graph integrating identified datasets, linked to FAIR metadata via the manuscript. Finally, we consolidate and quantify our analysis across different topics, disciplines and organisations over time. We intend that this work will provide a methodological basis for data link detection and classification, with insights into data reuse patterns across the published literature. Determining granular and comprehensive patterns of data reuse offers the potential for more targeted interventions to promote reusability by research organisations.

Keywords

TDM, Automation, data reusability, data sharing, data citation, data reuse, research data, NLP, Scholix

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 20
    download downloads 1
  • 20
    views
    1
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
20
1
Green
Related to Research communities
OpenAIRE