
doi: 10.1093/molbev/msac034 , 10.1101/2021.03.08.434372 , 10.26181/19790935.v1 , 10.26181/19790935 , 10.17863/cam.81580 , 10.17863/cam.82370
pmid: 35143670
pmc: PMC8892942
handle: 11343/317048
doi: 10.1093/molbev/msac034 , 10.1101/2021.03.08.434372 , 10.26181/19790935.v1 , 10.26181/19790935 , 10.17863/cam.81580 , 10.17863/cam.82370
pmid: 35143670
pmc: PMC8892942
handle: 11343/317048
Abstract Bioinformatic research relies on large-scale computational infrastructures which have a nonzero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org, last accessed 2022). We assessed 1) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations, as well as 2) computation strategies, such as parallelization, CPU (central processing unit) versus GPU (graphics processing unit), cloud versus local computing infrastructure, and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, for example, upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data center to a more efficient one can reduce carbon footprint by approximately 34%. Memory over-allocation can also be a substantial contributor to an algorithm’s greenhouse gas emissions. The use of faster processors or greater parallelization reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimize kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.
carbon footprint, Bioengineering, 3105 Genetics, Genetics not elsewhere classified, green algorithms, genomics, Genetics, Discoveries, Carbon Footprint, Bioinformatics and computational biology not elsewhere classified, Human Genome, Computational Biology, bioinformatics, Genomics, 004, Networking and Information Technology R&D (NITRD), FOS: Biological sciences, Networking and Information Technology R&D (NITRD), Algorithms, Software, 12 Responsible Consumption and Production, Genome-Wide Association Study, 31 Biological Sciences
carbon footprint, Bioengineering, 3105 Genetics, Genetics not elsewhere classified, green algorithms, genomics, Genetics, Discoveries, Carbon Footprint, Bioinformatics and computational biology not elsewhere classified, Human Genome, Computational Biology, bioinformatics, Genomics, 004, Networking and Information Technology R&D (NITRD), FOS: Biological sciences, Networking and Information Technology R&D (NITRD), Algorithms, Software, 12 Responsible Consumption and Production, Genome-Wide Association Study, 31 Biological Sciences
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 84 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
