'&%$£ In = &%$£ Out: How Controlled Vocabularies and Metadata Standards Are Fundamental for Developing Open Research Indicators

In 2024 the UK Reproducibility Network (UKRN) initiated a set of pilots involving institutional members and solution providers to establish good practice in institutional monitoring of Open Research through the creation of robust indicators. The Open Research Indicators Pilot was sector led, with institutions and solution providers working together to develop, test, and evaluate prototype machine learning solutions with valid, reliable, and ethical indicators for measuring Open Research. The University of Bristol was the lead for the ‘Openness of Data’ pilot and assessed providers’ data to ascertain the usefulness of machine learning for this purpose. The pilot’s findings highlight the inherent challenges and limitations of monitoring and assessing published datasets for openness within a research landscape that prioritises articles as benchmark outputs; the combination of article primacy and existing publisher and repository systems means datasets can currently only be monitored in Data Availability Statements (DAS). Our analysis of machine learning tools confirmed an uncomfortable truth many in the RDM community suspected; we do not have enough openly available machine actionable metadata for digital tools to reliably and accurately extract DAS, and we are not doing enough at the human interface with researchers to ensure their DAS are easy to understand and describe how their data can be found by others, which impacts measuring openness.

Keywords

Paper, Machine Learning, Data Availability Statements, Creating and sustaining communities for curation support and development, Open Data, Developing new curation tools and services, Curation challenges and opportunities from Artificial Intelligence and Machine Learning, Metadata Standards, Controlled Language

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now