Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Software . 2026
License: CC BY
Data sources: Datacite
ZENODO
Software . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Argo workflow - BioFlow-Ontology mapping

Authors: Puget Gil, Jey;

Argo workflow - BioFlow-Ontology mapping

Abstract

Argo Workflow to Knowledge Graph Extractor This script (create-wf.py) is designed to parse an execution JSON of an Argo Workflow and translate its metadata, architecture, and execution details into a Semantic Knowledge Graph. The output is serialized in the RDF Turtle (.ttl) format. Goal The main objective of this script is to capture provenance, lifecycle, and structural data of workflows executed via Argo Workflows, mapping them to standard ontologies and vocabularies. The generated Knowledge Graph utilizes several semantic namespaces:* ShareFair: For workflow, subworkflow, step, and variable representation.* PROV-O: For tracking agents and creators (provenance).* Schema.org: For temporal data (start/end times), resource consumption, and basic property values.* P-PLAN: For aligning steps and plans.* Workflow-Run: For tracking computing resource usages. By creating this graph, workflow executions become queryable and interoperable, facilitating FAIR (Findable, Accessible, Interoperable, Reusable) data principles for computational pipelines. Key Features 1. Workflow & Agent Extraction: Maps the overall workflow template, its execution instance, and the agent (creator).2. Node Topology: Distinguishes between DAG/TaskGroup and executable steps, maintaining the parent-child relationships.3. Execution Tracking: Records start and finish times for the workflow and individual steps.4. Resource Monitoring: Extracts computational resources consumed during step execution.5. Data Flow (Inputs/Outputs): Differentiates between abstract workflow variables (parameters defined in the template) and concrete entities (the actual values/artifacts passed during execution).6. URI Normalization: Automatically sanitizes and hashes complex string identifiers to generate valid URIs, preventing RDF serialization errors. Prerequisites Ensure you have Python installed along with the required dependencies. You can install the required packages using the requirements.txt file setup for this workspace: pip install -r requirements.txt Usage The script is executed via the command line and requires at least the path to the Argo Workflow JSON file. You can optionally specify the path for the output `.ttl` file. Command Syntax: python create-wf.py [output-knowledge-graph.ttl] Output The script successfully generates a `.ttl` file containing triples that represent the workflow's structure and execution. This file can be imported into triple stores (like GraphDB, Virtuoso, or Blazegraph) or parsed dynamically by SPARQL engines for further analysis. References - El Garb, M., Coquery, E., Duchateau, F., & Lumineau, N. (2025, July). Improving reproducibility in bioinformatics workflows with BioFlow-Model. In Proceedings of the 3rd ACM Conference on Reproducibility and Replicability (pp. 202-207). - https://dl.acm.org/doi/full/10.1145/3736731.3746139

Keywords

Workflow

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average