Downloads provided by UsageCounts
AbstractMotivationCurrent sequencing technologies are able to produce reads orders of magnitude longer than ever possible before. Such long reads have sparked a new interest in de novo genome assembly, which removes reference biases inherent to re-sequencing approaches and allows for a direct characterization of complex genomic variants. However, even with latest algorithmic advances, assembling a mammalian genome from long error-prone reads incurs a significant computational burden and does not preclude occasional misassemblies. Both problems could potentially be mitigated if assembly could commence for each chromosome separately.ResultsTo address this, we show how single-cell template strand sequencing (Strand-seq) data can be leveraged for this purpose. We introduce a novel latent variable model and a corresponding Expectation Maximization algorithm, termed SaaRclust, and demonstrates its ability to reliably cluster long reads by chromosome. For each long read, this approach produces a posterior probability distribution over all chromosomes of origin and read directionalities. In this way, it allows to assess the amount of uncertainty inherent to sparse Strand-seq data on the level of individual reads. Among the reads that our algorithm confidently assigns to a chromosome, we observed more than 99% correct assignments on a subset of Pacific Bioscience reads with 30.1× coverage. To our knowledge, SaaRclust is the first approach for the in silico separation of long reads by chromosome prior to assembly.Availability and implementationhttps://github.com/daewoooo/SaaRclust
Ismb 2018–Intelligent Systems for Molecular Biology Proceedings, Cancer Research, Genome, Human, High-Throughput Nucleotide Sequencing, Genomics, Sequence Analysis, DNA, Chromosomes, Human, Humans, Computer Simulation, Female, denovo, genome, assembly, long read, sequencing, Strand-seq, Algorithms, Software
Ismb 2018–Intelligent Systems for Molecular Biology Proceedings, Cancer Research, Genome, Human, High-Throughput Nucleotide Sequencing, Genomics, Sequence Analysis, DNA, Chromosomes, Human, Humans, Computer Simulation, Female, denovo, genome, assembly, long read, sequencing, Strand-seq, Algorithms, Software
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 28 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
| views | 3 | |
| downloads | 3 |

Views provided by UsageCounts
Downloads provided by UsageCounts