
Background This dataset presents the initial version of a Metadata Monitoring Framework developed by the PID4NFDI Coordination Hub to assess and improve the quality, completeness, and interoperability of DataCite DOI metadata within the National Research Data Infrastructure (NFDI) in Germany. The framework emerged directly from community work conducted in 2025, including two dedicated focus groups on Persistent Identifier (PID) integration in Data Management Plans (DMPs) and Electronic Lab Notebooks (ELNs), and a joint in-person workshop held in Berlin in September 2025. Motivation A key finding of the PID4NFDI focus group and workshop programme was that metadata quality at the point of DOI registration is highly variable across NFDI platforms and repositories. Participants identified widespread gaps in the use of recommended and optional DataCite schema fields , including subject classification, funding references, ORCID and ROR identifiers, licensing information, and relatedIdentifier entries that would support provenance and data lineage. These gaps reduce the discoverability, reusability, and machine-actionability of research outputs, and impede interoperability with aggregators such as OpenAIRE and EOSC. Recognising that sustainable improvement requires systematic measurement rather than ad hoc curation, the PID4NFDI team developed this framework to provide NFDI platforms, repository managers, and data stewards with a structured set of Key Performance Indicators (KPIs), measurable targets, and DataCite API query patterns to audit and monitor DOI metadata quality over time. Contents The framework is organised as a four-sheet workbook: About: Framework metadata, description, navigation guide, record metadata (author, ORCID, license, DOI), and changelog. This sheet orients new readers and provides all bibliographic information needed to cite or reuse the framework. Monitoring Goals & KPIs: Defines five goal areas for monitoring (Discoverability, PID Maturity, Compliance, PID Linking, and Reusability) with 20 KPIs, each carrying a short ID (e.g. D-1, L-2), a draft threshold target, the basis for that target, priority level, assessment frequency, and responsible role. Targets are indicative and intended as a starting point for community discussion; they should be calibrated through consultation between the PID4NFDI Coordination Hub and NFDI platform operators. API Query Catalogue: A curated catalogue of 34+ DataCite REST API queries grouped into seven use-case categories (Metadata Completeness, Present Information, Person/Organisation, Topic, Resource Type, Analysis/Reporting, and Newer Parameters). Each query carries a short ID that cross-references the KPIs sheet, and is annotated with its intended target group, rationale, verification status, and caveats. All queries were verified against the live DataCite REST API in April 2026; two queries from the original draft were corrected during verification. Baseline Log: An operational sheet for recording actual measured values against KPIs over time. One row per measurement, with controlled drop-down lists for KPI ID, method, and query ID. Kept separate from the framework definition so the published version remains clean while the log accumulates data across monitoring cycles. Context and Related Work This framework is published as part of the broader PID4NFDI programme, which aims to establish consistent, interoperable PID practices across all 26 NFDI consortia. It complements the Landscape Analysis of PID Practices in NFDI (El-Gebali & Böhm, 2025; https://doi.org/10.5281/zenodo.15689799) and forms a technical foundation for two incubator projects launching in 2026 in collaboration with TS4NFDI (Terminology Services for NFDI), which will operationalize the DataCite metadata schema and integrate controlled vocabulary services into ELN and DMP tools. The framework is version 1.0 and is intended to evolve iteratively as community input is gathered, targets are validated against real data, and the Baseline Log sheet is populated through pilot monitoring activities. Intended Audience Repository managers and platform operators within NFDI and DataCite member organizations; data stewards seeking structured guidance on metadata quality monitoring; infrastructure teams implementing PID workflows in ELNs and DMPs; and the broader research data management and FAIR community.
