Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Conference object . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

MaRDI's Zenodo Community for Graphical Modeling and Causal Inference

Authors: Mareis, Leopold; Haug, Stephan; Drton, Mathias;

MaRDI's Zenodo Community for Graphical Modeling and Causal Inference

Abstract

The Graphical Modeling and Causal Inference (GMCI) community, hosted on Zenodo [1][A], supports researchers developing statistical methodology via a curated collection of datasets and notebooks. Adopting the FAIR principles [2], it seeks to host and moderate topical contributions of datasets, metadata, data analyses, and methodological implementations in an enriched repository akin to related efforts such as CauseMe [3], OpenML [4], PhysioNet [5]. As a Zenodo community, GMCI provides a free long-term storage solution with unique digital object identifiers (DOI) that ensure and build a stable network of digital research data. Following an open data access strategy under transparently communicated design decisions, the submissions are findable and readily reusable. Initiated as part of MaRDI, the Mathematical Research Data Initiative, a consortium of the National Research Data Infrastructure (NFDI), community entries are embedded into the online MaRDI knowledge graph [B] with over 5 million nodes of publications, software, models, authors, and further information. The GMCI community primarily addresses statisticians working on methods for structure learning and causal effect estimation in the context of probabilistic and causal graphical models [C], [6]–[8]. These problems are notable because the quality of estimates cannot be validated using standard tabular datasets alone, as they target underlying stochastic dependence structures or unobserved interventional regimes. Hence, empirical comparisons of estimation methods rely on enriched datasets, which must include some information on a ground truth. Drawing on a well-curated initial collection of enriched datasets, the GMCI community is designed to establish best-practice standards, also for further moderated submissions from the broader academic community. There are currently two options for contributions to the GMCI community's Zenodo repository [A] in the form of datasets (or dataset collections) and software. The complete submission procedure is detailed online [D]. Any submission requires literature references to the origin of the datasets or methodologies and valid licensing information. Community moderators [E] process each submission, create the necessary embedding links, potentially request revisions, and assist with possible questions. As of April 2025, there were 14 curated datasets online with more than 2,000 recorded downloads. Researchers in various fields face common challenges when modeling data and making causal inferences, as they must apply the available algorithms correctly and draw appropriate conclusions from the obtained results. To connect developed methodologies and raw datasets, we are currently curating educational material as references for data analysis workflows. Understanding workflows and arguments will increase the quality of statistical analyses from both statisticians and non-statisticians and will enable researchers to share their insights with the community. Software contributions that demonstrate and effectively communicate relevant research findings are connected to the community through Zenodo's git release integration. As these contributions are only presented in a folder structure, we are working on an online book to present the statistical notebooks jointly. In summary, the GMCI community offers a stable, structured platform that accumulates enriched datasets and methodological implementations. Contributions gain visibility within a relevant audience, including non-statistical researchers, while also meeting citation requirements through assigned DOIs. By curating high-quality submissions and integrating them into a broader research network, we support transparent, reusable, and collaborative data sharing.

Related Organizations
Keywords

NFDI MaRDI, Community, Statistical Causality, Dataset Repository

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green