
pmid: 29620920
Abstract The ubiquity of next generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2,652 human RNA-seq experiments uploaded to the Sequence Read Archive. Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this paper, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39 - 85%, with a price of up to 3x memory consumption during queries. Notably, it can query a batch of 198,074 queries in under 8 hours (compared to around two days previously) and a whole set of k -mers from a sequencing experiment (about 27 mil k -mers) in under 11 minutes.
sequence Bloom trees, algorithms, data structures, Humans, Breast, [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM], bepress|Biology, bepress|Life Sciences|Biology, Brain, Computational Biology, High-Throughput Nucleotide Sequencing, bioinformatics, Sequence Analysis, DNA, Blood, bepress|Life Sciences|Bioinformatics, Bloom lters, Female, RNA-seq, bepress|Bioinformatics, Databases, Nucleic Acid, Transcriptome, Algorithms, Software
sequence Bloom trees, algorithms, data structures, Humans, Breast, [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM], bepress|Biology, bepress|Life Sciences|Biology, Brain, Computational Biology, High-Throughput Nucleotide Sequencing, bioinformatics, Sequence Analysis, DNA, Blood, bepress|Life Sciences|Bioinformatics, Bloom lters, Female, RNA-seq, bepress|Bioinformatics, Databases, Nucleic Acid, Transcriptome, Algorithms, Software
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 44 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
