ConnOSS and Metadata Extraction for Research Software

Metadata and software descriptors help realize the FAIR principles for software. Various efforts exist around research software metadata (e.g., CodeMeta, Bioschemas, maSMP schema) as well as metadata extraction (e.g., SOMEF, HERMES, MAUS). Despite these efforts, existing tools and schemas remain fragmented, cover limited metadata, and are rarely built for large-scale, automated processing or enrichment with modern AI techniques. To address this gap, the Connected Open Source Software (ConnOSS) project aims to provide a consistent infrastructure for metadata extraction and publication, enabling researchers to create harmonized software descriptions and facilitating metadata harvesting by registries and aggregators. The project aims to analyze and extend existing research software metadata schemas, identify metadata sources, and develop a harmonized extraction pipeline from platforms like GitHub and GitLab. Machine learning models trained on a curated corpus plan to extract, enrich, and validate metadata from README files, addressing current automation gaps. A publication workflow then intends to make metadata accessible to humans and machines via GitHub/GitLab pages. In this poster we introduce ConnOSS and present a preliminary comparison across different research software metadata extractors which will be later used to define the requirements for the ConnOSS metadata extractor. This work is part of the contributions to the deRSE 2026 Conference, see https://events.hifis.net/event/2945/contributions/21334/ This work has been supported by the German Research Foundation (DFG) through the project ConnOSS with project number 561044496.

Related Organizations

Leibniz Association
Germany
German National Library of Medicine
Germany
Leibniz Institute for the Social Sciences
Germany
Oldenburger Institut für Informatik
Germany
Carl von Ossietzky University of Oldenburg
Germany

Keywords

Metadata, Metadata extraction, Research software

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now