Deconvolute individual genomes from metagenome sequences through short read clustering

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 08 Apr 2020 United States English Publisher:PeerJJournal:PeerJ, volume 8, page e8966 (eissn: 2167-8359,

Copyright policy )

Authors: Kexue Li; Yakang Lu; Li Deng; Lili Wang; Lizhen Shi; Zhong Wang;

doi: 10.7717/peerj.8966

pmid: 32296615

pmc: PMC7150542

Deconvolute individual genomes from metagenome sequences through short read clustering

- Summary
- Subjects
- Metrics

Abstract

Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads by species before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems. Here we extended our previous read clustering software, SpaRC, by exploiting statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using synthetic and real-world datasets we demonstrated that this method has the potential to cluster almost all of the short reads from genomes with sufficient sequencing coverage. The improved read clustering in turn leads to improved downstream genome assembly quality.

Country

United States

Related Organizations

University of California, San Francisco
United States
Shanghai University
China (People's Republic of)
Florida Southern College
United States
Shanghai University
China (People's Republic of)
University of Florida
United States

View all View all

Keywords

570, Short-read clustering, Apache Spark, QH301-705.5, Bioinformatics, Human Genome, Bioinformatics and Computational Biology, R, Biological Sciences, Medical and Health Sciences, 004, Networking and Information Technology R&D (NITRD), Genetics, Medicine, Biology (General), Metagenome clustering

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	6
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%