
Metadata and software descriptors help realize the FAIR principles for software. Various efforts exist around research software metadata (e.g., CodeMeta, Bioschemas, maSMP schema) as well as metadata extraction (e.g., SOMEF, HERMES, MAUS). Despite these efforts, existing tools and schemas remain fragmented, cover limited metadata, and are rarely built for large-scale, automated processing or enrichment with modern AI techniques. To address this gap, the Connected Open Source Software (ConnOSS) project aims to provide a consistent infrastructure for metadata extraction and publication, enabling researchers to create harmonized software descriptions and facilitating metadata harvesting by registries and aggregators. The project aims to analyze and extend existing research software metadata schemas, identify metadata sources, and develop a harmonized extraction pipeline from platforms like GitHub and GitLab. Machine learning models trained on a curated corpus plan to extract, enrich, and validate metadata from README files, addressing current automation gaps. A publication workflow then intends to make metadata accessible to humans and machines via GitHub/GitLab pages. In this poster we introduce ConnOSS and present a preliminary comparison across different research software metadata extractors which will be later used to define the requirements for the ConnOSS metadata extractor. This work is part of the contributions to the deRSE 2026 Conference, see https://events.hifis.net/event/2945/contributions/21334/ This work has been supported by the German Research Foundation (DFG) through the project ConnOSS with project number 561044496.
Metadata, Metadata extraction, Research software
Metadata, Metadata extraction, Research software
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
