publication . Preprint . Other literature type . Article . 2019

Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv

Khan, Farah Zaib; Soiland-Reyes, Stian; Sinnott, Richard O.; Lonie, Andrew; Goble, Carole; Crusoe, Michael R.;
Open Access
  • Published: 01 Nov 2019
  • Country: United Kingdom
Abstract
<jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable automation, scaling, adaptation, and provenance support. However, there are still several challenges associated with the effective sharing, publication, and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><ja...
Subjects
free text keywords: Provenance, Common Workflow Language, CWL, Research Object, RO, BagIt, Interoperability, Scientific Workflows, Containers, Research, Software engineering, business.industry, business, Suite, Reuse, Software, Automation, Best practice, Workflow, Executable, computer.file_format, computer
Related Organizations
Funded by
EC| BioExcel
Project
BioExcel
Centre of Excellence for Biomolecular Research
  • Funder: European Commission (EC)
  • Project Code: 675728
  • Funding stream: H2020 | RIA
,
EC| BioExcel-2
Project
BioExcel-2
BioExcel Centre of Excellence for ComputationalBiomolecular Research
  • Funder: European Commission (EC)
  • Project Code: 823830
  • Funding stream: H2020 | RIA
,
EC| IBISBA 1.0
Project
IBISBA 1.0
Industrial Biotechnology Innovation and Synthetic Biology Accelerator
  • Funder: European Commission (EC)
  • Project Code: 730976
  • Funding stream: H2020 | RIA
,
EC| EOSCpilot
Project
EOSCpilot
The European Open Science Cloud for Research Pilot Project.
  • Funder: European Commission (EC)
  • Project Code: 739563
  • Funding stream: H2020 | RIA
Communities
EGI FederationEGI Projects: EOSCpilot
Agricultural and Food SciencesAGINFRA+ Projects: European Open Science Cloud - pilot
ZENODO
Preprint . 2018
Provider: ZENODO
ZENODO
Preprint . 2019
Provider: ZENODO
Zenodo
Other literature type . 2019
Provider: Datacite
161 references, page 1 of 11

1.Stephens ZD, Lee SY, Faghri F, et al.Big data: astronomical or genomical?. PLoS Biol. 2015;13(7):e1002195, 10.1371/journal.pbio.1002195.26151137 [OpenAIRE] [PubMed]

2.Atkinson M, Gesing S, Montagnat J, et al.Scientific workflows: past, present and future. Future Gener Comput Syst. 2017;75:216–27., 10.1016/j.future.2017.05.041.

3.Spjuth O, Bongcam-Rudloff E, Hernández GC, et al.Experiences with workflows for automating data-intensive bioinformatics. Biol Direct. 2015;10, 10.1186/s13062-015-0071-8.

4.Cuevas-Vicenttín V, Dey S, Köhler S, et al.Scientific workflows and provenance: introduction and research opportunities. Datenbank Spektrum. 2012;12(3):193–203., 10.1007/s13222-012-0100-z.

5.Existing Workf low Systems. Common Workflow Language project 2018 https://s.apache.org/existing-workflow-systems. Accessed 12 September 2018.

6.Amstutz P, Crusoe MR, Nebojša T, et al.Common Workflow Language, v1.0. Figshare. 2016, 10.6084/m9.figshare.3115156.v2.

7.Ivie P, Thain D Reproducibility in scientific computing. ACM Comput Surv. 2018;51(3):63:1–63:36., 10.1145/3186266.

8.Belhajjame K, Zhao J, Garijo D, et al.Using a suite of ontologies for preserving workflow-centric resea rch objects. J Web Semantics. 2015;32:16–42., 10.1016/j.websem.2015.01.003.

9.Kunze JA, Littman J, Madden L, et al.The BagIt File Packaging Format (V1.0). Request for Comments RFC8493. RFC Editor, 2018, 10.17487/RFC8493. [DOI]

10.Missier P, Belhajjame K, Cheney J The W3C PROV family of specifications for modelling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology EDBT ’13, Genoa, Italy. New York, NY: ACM; 2013: 773–6., 10.1145/2452376.2452478. [OpenAIRE]

11.Hettne KM, Dharuri H, Zhao J, et al.Structuring research methods and data with the research object model: genomics workflows as a case study. J Biomed Semantics. 2014;5(1):41, 10.1186/2041-1480-5-41.25276335 [OpenAIRE] [PubMed]

12.Belhajjame K, Corcho O, Garijo D, et al.Workflow-centric research objects: first class citizens in scholarly discourse. In: Proceedings of the 2nd Workshop on Semantic Publishing (SePublica 2012), Hersonissos, Crete, 2012. 2012: 1–12., http://ceur-ws.org/Vol-903/paper-01.pdf. [OpenAIRE]

13.Amstutz P, Crusoe MR, Khan FZ, et al.common-workflow-language/cwltool: 1.0.20181012180214. Zenodo. 2018, 10.5281/zenodo.1471589.

14.Herschel M, Diestelkämper R, Ben Lahmar H A survey on provenance: What for? What form? What from?. VLDB J. 2017;26(6):881–906., 10.1007/s00778-017-0486-1. [OpenAIRE] [DOI]

15.Moreau L, Missier P, Belhajjame K, et al.PROV-DM: The PROV Data Model. 2013 https://www.w3.org/TR/2013/REC-prov-dm-20130430/. Accessed 3 October 2018.

161 references, page 1 of 11
Abstract
<jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable automation, scaling, adaptation, and provenance support. However, there are still several challenges associated with the effective sharing, publication, and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><ja...
Subjects
free text keywords: Provenance, Common Workflow Language, CWL, Research Object, RO, BagIt, Interoperability, Scientific Workflows, Containers, Research, Software engineering, business.industry, business, Suite, Reuse, Software, Automation, Best practice, Workflow, Executable, computer.file_format, computer
Related Organizations
Funded by
EC| BioExcel
Project
BioExcel
Centre of Excellence for Biomolecular Research
  • Funder: European Commission (EC)
  • Project Code: 675728
  • Funding stream: H2020 | RIA
,
EC| BioExcel-2
Project
BioExcel-2
BioExcel Centre of Excellence for ComputationalBiomolecular Research
  • Funder: European Commission (EC)
  • Project Code: 823830
  • Funding stream: H2020 | RIA
,
EC| IBISBA 1.0
Project
IBISBA 1.0
Industrial Biotechnology Innovation and Synthetic Biology Accelerator
  • Funder: European Commission (EC)
  • Project Code: 730976
  • Funding stream: H2020 | RIA
,
EC| EOSCpilot
Project
EOSCpilot
The European Open Science Cloud for Research Pilot Project.
  • Funder: European Commission (EC)
  • Project Code: 739563
  • Funding stream: H2020 | RIA
Communities
EGI FederationEGI Projects: EOSCpilot
Agricultural and Food SciencesAGINFRA+ Projects: European Open Science Cloud - pilot
ZENODO
Preprint . 2018
Provider: ZENODO
ZENODO
Preprint . 2019
Provider: ZENODO
Zenodo
Other literature type . 2019
Provider: Datacite
161 references, page 1 of 11

1.Stephens ZD, Lee SY, Faghri F, et al.Big data: astronomical or genomical?. PLoS Biol. 2015;13(7):e1002195, 10.1371/journal.pbio.1002195.26151137 [OpenAIRE] [PubMed]

2.Atkinson M, Gesing S, Montagnat J, et al.Scientific workflows: past, present and future. Future Gener Comput Syst. 2017;75:216–27., 10.1016/j.future.2017.05.041.

3.Spjuth O, Bongcam-Rudloff E, Hernández GC, et al.Experiences with workflows for automating data-intensive bioinformatics. Biol Direct. 2015;10, 10.1186/s13062-015-0071-8.

4.Cuevas-Vicenttín V, Dey S, Köhler S, et al.Scientific workflows and provenance: introduction and research opportunities. Datenbank Spektrum. 2012;12(3):193–203., 10.1007/s13222-012-0100-z.

5.Existing Workf low Systems. Common Workflow Language project 2018 https://s.apache.org/existing-workflow-systems. Accessed 12 September 2018.

6.Amstutz P, Crusoe MR, Nebojša T, et al.Common Workflow Language, v1.0. Figshare. 2016, 10.6084/m9.figshare.3115156.v2.

7.Ivie P, Thain D Reproducibility in scientific computing. ACM Comput Surv. 2018;51(3):63:1–63:36., 10.1145/3186266.

8.Belhajjame K, Zhao J, Garijo D, et al.Using a suite of ontologies for preserving workflow-centric resea rch objects. J Web Semantics. 2015;32:16–42., 10.1016/j.websem.2015.01.003.

9.Kunze JA, Littman J, Madden L, et al.The BagIt File Packaging Format (V1.0). Request for Comments RFC8493. RFC Editor, 2018, 10.17487/RFC8493. [DOI]

10.Missier P, Belhajjame K, Cheney J The W3C PROV family of specifications for modelling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology EDBT ’13, Genoa, Italy. New York, NY: ACM; 2013: 773–6., 10.1145/2452376.2452478. [OpenAIRE]

11.Hettne KM, Dharuri H, Zhao J, et al.Structuring research methods and data with the research object model: genomics workflows as a case study. J Biomed Semantics. 2014;5(1):41, 10.1186/2041-1480-5-41.25276335 [OpenAIRE] [PubMed]

12.Belhajjame K, Corcho O, Garijo D, et al.Workflow-centric research objects: first class citizens in scholarly discourse. In: Proceedings of the 2nd Workshop on Semantic Publishing (SePublica 2012), Hersonissos, Crete, 2012. 2012: 1–12., http://ceur-ws.org/Vol-903/paper-01.pdf. [OpenAIRE]

13.Amstutz P, Crusoe MR, Khan FZ, et al.common-workflow-language/cwltool: 1.0.20181012180214. Zenodo. 2018, 10.5281/zenodo.1471589.

14.Herschel M, Diestelkämper R, Ben Lahmar H A survey on provenance: What for? What form? What from?. VLDB J. 2017;26(6):881–906., 10.1007/s00778-017-0486-1. [OpenAIRE] [DOI]

15.Moreau L, Missier P, Belhajjame K, et al.PROV-DM: The PROV Data Model. 2013 https://www.w3.org/TR/2013/REC-prov-dm-20130430/. Accessed 3 October 2018.

161 references, page 1 of 11
Any information missing or wrong?Report an Issue