PyBDA: a command line tool for automated analysis of big biological data sets

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 12 Nov 2019Embargo end date: 01 Jan 2019 Switzerland English Publisher:Springer Science and Business Media LLCJournal:BMC Bioinformatics, volume 20 (eissn: 1471-2105,

Copyright policy )

Authors: Simon Dirmeier; Mario Emmenlauer; Christoph Dehio; Niko Beerenwinkel;

doi: 10.1186/s12859-019-3087-8 , 10.3929/ethz-b-000379270 , 10.5451/unibas-ep74629

pmid: 31718539

pmc: PMC6849186

handle: 20.500.11850/379270

PyBDA: a command line tool for automated analysis of big biological data sets

- Summary
- Subjects
- Related research
  (5)
- Metrics

Abstract

AbstractBackgroundAnalysing large and high-dimensional biological data sets poses significant computational difficulties for bioinformaticians due to lack of accessible tools that scale to hundreds of millions of data points.ResultsWe developed a novel machine learning command line tool called PyBDA for automated, distributed analysis of big biological data sets. By using Apache Spark in the backend, PyBDA scales to data sets beyond the size of current applications. It uses Snakemake in order to automatically schedule jobs to a high-performance computing cluster. We demonstrate the utility of the software by analyzing image-based RNA interference data of 150 million single cells.ConclusionPyBDA allows automated, easy-to-use data analysis using common statistical methods and machine learning algorithms. It can be used with simple command line calls entirely making it accessible to a broad user base. PyBDA is available athttps://pybda.rtfd.io.

Country

Switzerland

Related Organizations

ETH Zurich
Switzerland
University of Basel
Switzerland
University of Lausanne
Switzerland
SIB Swiss Institute of Bioinformatics
Switzerland

Keywords

QH301-705.5, Computer applications to medicine. Medical informatics, R858-859.7, Data analysis, Computational Biology, Computing Methodologies, Grid engine, Big data; Data analysis; Command line; Pipeline; Computing cluster; Grid engine; Machine learning, Command line, Machine Learning, Automation, Big data, Pipeline, Machine learning, Image Processing, Computer-Assisted, Humans, Biology (General), Computing cluster, Software, Algorithms, HeLa Cells

5 Research products, page 1 of 1

PyBDA: a command line tool for automated analysis of big biological data sets
2019IsSupplementedBy
MOESM1 of PyBDA: a command line tool for automated analysis of big biological data sets
2019IsSupplementedBy
h2o-3 software on GitHub
IsRelatedTo
bench-ml software on GitHub
IsRelatedTo
pybda software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average