publication . Article . Conference object . Part of book or chapter of book . Preprint . 2016

Approximate string matching using a bidirectional index

Gregory Kucherov; Kamil Salikhov; Dekel Tsur;
Open Access English
  • Published: 01 Jul 2016
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; We study strategies of approximate pattern matching that exploit bidirectional text indexes, extending and generalizing ideas of [5]. We introduce a formalism, called search schemes, to specify search strate-gies of this type, then develop a probabilistic measure for the efficiency of a search scheme, prove several combinatorial results on efficient search schemes, and finally, provide experimental computations supporting the superiority of our strategies.
Subjects
free text keywords: approximate pattern matching, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], General Computer Science, Theoretical Computer Science, Computer Science - Data Structures and Algorithms, Dynamic programming, Exploit, Generalization, Theoretical computer science, Formalism (philosophy), Computation, Approximate string matching, Pattern matching, Probabilistic logic, Mathematics, Algorithm

1. D. Belazzougui, F. Cunial, J. K¨arkka¨inen, and V. Ma¨kinen. Versatile succinct representations of the bidirectional burrows-wheeler transform. In Proc. 21st European Symposium on Algorithms (ESA), pages 133-144, 2013. [OpenAIRE]

2. M. Burrow and D. Wheeler. A block-sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation, California, 1994.

3. L H. Y. Chen. Poisson approximation for dependent trials. The Annals of Probability, pages 534-545, 1975.

4. P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proc. 41st Symposium on Foundation of Computer Science (FOCS), pages 390- 398, 2000.

5. T. W. Lam, R. Li, A. Tam, S. C. K. Wong, E. Wu, and S.-M. Yiu. High throughput short read alignment via bi-directional BWT. In Proc. IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 31-36, 2009.

6. T. W. Lam, W. K. Sung, and S. S. Wong. Improved approximate string matching using compressed suffix data structures. In Proc. 16th International Symposium on Algorithms and Computation (ISAAC), pages 339-348, 2005.

7. B. Langmead, C. Trapnell, M. Pop, and S. Salzberg. Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3):R25, 2009.

8. H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14):1754-1760, 2009. [OpenAIRE]

9. H. Li and N. Homer. A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics, 11(5):473-483, 2010.

10. G. Navarro and V. Ma¨kinen. Compressed full-text indexes. ACM Computing Surveys, 39(1), 2007.

11. L.M.S. Russo, G. Navarro, A.L. Oliveira, and P. Morales. Approximate string matching with compressed indexes. Algorithms, 2(3):1105-1136, 2009. [OpenAIRE]

12. T. Schnattinger, E. Ohlebusch, and S. Gog. Bidirectional search in a string with wavelet trees and bidirectional matching statistics. Information and Computation, 213:13-22, 2012. [OpenAIRE]

13. J.T. Simpson and R. Durbin. Efficient de novo assembly of large genomes using compressed data structures. Genome Research, 22(3):549-556, 2012.

14. W.-K. Sung. Indexed approximate string matching. In Ming-Yang Kao, editor, Encyclopedia of Algorithms, pages 1-99. Springer US, 2008.

Any information missing or wrong?Report an Issue