Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Scientific Workflow Development Using Large Language Models

Authors: Scientific, Workflows;

Scientific Workflow Development Using Large Language Models

Abstract

This replication package contains all materials used to evaluate how Large Language Models (LLMs) support scientific workflow development in Galaxy and Nextflow. It includes the full set of prompts, LLM responses, and generated workflows analyzed in the study. The package provides six PDF files: (1) LLMs’ understanding of fundamental scientific workflow and workflow-system concepts, and (2) their domain knowledge of Galaxy and Nextflow platforms, including architecture, key features, and reproducibility mechanisms. It also includes workflow-specific background questions for both systems, covering domain tasks such as SNP-rich exon detection, peak-to-gene association, methylation analysis, and QC pipelines. The package further provides the complete workflows generated by GPT-4o, Gemini 2.5 Flash, and DeepSeek-V3 for a set of benchmark tasks, detailing tool selections, execution steps, file transformations, and workflow structure. Together, these artifacts enable full transparency and reproducibility of our multi-dimensional assessment of LLMs’ conceptual reasoning, domain understanding, and workflow-generation capabilities across two major scientific workflow systems. The first two files provide foundational insights. The first file, Table-2 Fundamental_Concepts_Of_Scientific_Workflow_and_SWS, includes LLM-generated responses to conceptual questions about scientific workflows and workflow systems, evaluating the understanding of GPT-4o, Gemini 2.5 Flash, and DeepSeek-V3. The second file, Table-3 LLMs Understanding of Galaxy and Nextflow, further explores LLMs’ domain-specific knowledge by addressing background questions about the Galaxy and Nextflow platforms, including their architecture, tools, reproducibility, and key features such as Galaxy’s ToolShed or Nextflow’s DSL concepts and nf-core integration. The next two files, Table-4 and Table-5, contain workflow-specific background questions designed to assess LLM comprehension of domain-level specific tasks within Galaxy and Nextflow, respectively. These include tasks such as identifying SNP-rich exons, associating peaks with genes, or understanding methylation data processing. The final two files, LLMs Generated workflows using Galaxy Workflow System and LLMs generated workflows using Nextflow Workflow System, showcase the actual workflows generated by LLMs in response to structured prompts. Each file presents detailed, step-by-step workflows for different tasks, comparing how each LLM structures, sequences, and explains the analyses using real-world tools and formats (e.g., FastQC, BEDTools, MultiQC). These documents together form a multi-dimensional assessment of LLMs’ capability in generating, reasoning about, and structuring scientific workflows.

Keywords

FOS: Computer and information sciences, Nextflow, Galaxy, Large Language Models, Scientific Workflows, Bioinformatics, Prompt, Gemini 2.5 Flash, GPT-4o, DeepSeek-V3

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average