Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC BY
Data sources: ZENODO
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
Lunaris
Dataset . 2024
License: CC BY
Data sources: Lunaris
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 4 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Artifact of the paper "An Empirical Investigation on the Challenges in Scientific Workflow Systems Development"

Authors: Alam, Khairul;

Artifact of the paper "An Empirical Investigation on the Challenges in Scientific Workflow Systems Development"

Abstract

Scientific Workflow Systems (SWSs) play a critical role in the contemporary scientific landscape, significantly enriching research endeavors by augmenting productivity and fostering collaboration, SWSs elevate the standard of scholarly inquiry, fortifying its pillars of reproducibility and ethical adherence. Essentially, they serve as the bedrock upon which efficient, transparent, and impactful research is built, propelling knowledge and innovation across diverse fields. SWSs accomplish mundane yet essential tasks intrinsic to scientific inquiry—ranging from data acquisition to analysis and reporting. By liberating researchers from the shackles of manual labor, SWSs enable them to channel their energies toward more intellectually demanding pursuits, thereby enhancing the pace and quality of research outcomes. Moreover, SWSs wield a formidable influence in standardizing workflows across research cohorts, instilling a sense of uniformity in experimental methodologies and data-handling practices. This standardization not only cultivates a culture of rigor and coherence but also fosters cross-disciplinary dialogue and collaboration. Integral to the operation of SWSs is their capacity to integrate diverse tools, software, and data sources, effectively functioning as centralized hubs for research management. This integration expedites the research process and facilitates seamless data exchange and interoperability—a pivotal asset in an era characterized by the deluge of data and the imperative of interdisciplinary collaboration. Furthermore, SWSs afford researchers and project managers real-time insights into the progress of research endeavors, empowering them to identify bottlenecks, allocate resources judiciously, and optimize workflow execution. This granular oversight enhances project transparency and accountability and serves as a catalyst for informed decision-making. Crucially, SWSs are engineered to accommodate the complexities inherent in scientific inquiry, adeptly handling vast volumes of data and supporting parallel processing to meet the evolving demands of research projects. This scalability underscores their adaptability to diverse research paradigms, ensuring their relevance across a spectrum of scientific disciplines. Facilitating collaboration across geographic and temporal divides, SWSs offer a suite of collaborative features—including version control, shared workspaces, and communication tools—that transcend the constraints of physical proximity. By fostering a culture of inclusivity and knowledge exchange, SWSs catalyze innovation and synergy among distributed research teams. Moreover, SWSs serve as custodians of reproducibility, meticulously documenting each facet of the research workflow—from data sources to analysis methods—thus safeguarding the integrity of scientific inquiry. This commitment to transparency and methodological rigor underpins the credibility of research findings, engendering trust within the scientific community and beyond. The customizable nature of SWSs empowers research teams to tailor their workflows to suit their unique needs and preferences, further amplifying their utility and versatility. In essence, SWSs emerge not merely as tools of convenience but as indispensable allies in the relentless pursuit of scientific excellence. Numerous developers actively participate in the advancement of SWSs through diverse roles, including designing system architectures to ensure flexibility and performance, developing algorithms for data processing and analysis, crafting user-friendly interfaces, handling backend logic, integrating with external tools, and ensuring quality, security, and compliance. They address challenges such as optimizing performance and scalability by leveraging parallel processing and distributed computing techniques. To tackle these diverse tasks, developers encounter numerous challenges, often turning to crowd-sourced platforms like Stack Overflow and GitHub to discuss and address them. Stack Overflow serves as a vital resource for developers to seek solutions, learn new technologies, validate best practices, and engage with the programming community. Similarly, GitHub facilitates collaborative development by allowing developers to report problems, propose enhancements, and contribute to open-source projects. Our research draws insights from Stack Overflow discussions, GitHub issues, and pull request reports related to SWSs, reflecting the dynamic and collaborative nature of software development in this domain.

scientific-workflow-systems-list.xlsx contains the SWSs list, and the Used SWSs For the analysis sheet contains our selected projects for our analysis. Collected Stack Overflow Data.zip contains the data we extracted from Stack Overflow. After unzipping, you will be able to find a folder, Popular SWfMS Filtered Data, where we store our selected projects data. After checking, we found many posts that were unrelated to SWS. We filtered that out and stored the irrelevant data in the Popular SWfMS Filtered Data/Unrelated data folder for verification. AllpostsforSelectedSWS.csv contains the merged data for our analysis using Stack Overflow Data. The GitHub Data.zip archive contains three main folders. The SelectedGitHubData folder includes the raw data we initially downloaded. The PreprocessedGitHubData folder contains the data after cleaning and preprocessing. Finally, the CombinedGitHubData folder holds the merged dataset that was used for our analysis. SO Data Other Fileds.zip contains data for other software engineering domains (i.e., mobile, security, webapp, chatbot). These datasets are used to compare SWSs with other Software engineering domains. As there are more than 1 million mobile development posts, we only share the IDs of the posts. Interested personnel can check the data using the ID.

Scripts.zip folder contains the script for downloading the issues/pull requests, data preprocessing, and topic modeling. You might need to change the directory location for each file before performing any operation. Final_Topics.zip contains the generated topics (RQ1) we obtained after running the BERTopic modeling algorithm. We obtained ten topics for SO data and 13 topics for GitHub Data. RQ2 Types Analysis.zip contains the types (How, Why, What, and Other) analysis results for RQ3. We selected statistically significant numbers for each topic to identify the types of RQ2.

Country
Canada
Related Organizations
  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities