
Abstract Motivation The minimizers technique is a method to sample k -mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k -mers as possible (i.e. having a low density ) is beneficial. The density is highly dependent on the choice of the order on the k -mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density, and thereby making existing and future bioinformatics tools even more efficient. Results From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare k -mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the 3 type of schemes. Contact gmarcais@cs.cmu.edu ckingsf@cs.cmu.edu
Ismb 2018–Intelligent Systems for Molecular Biology Proceedings, Computational Biology, Algorithms
Ismb 2018–Intelligent Systems for Molecular Biology Proceedings, Computational Biology, Algorithms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 45 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
