publication . Preprint . Other literature type . Article . 2016

An External-Memory Algorithm for String Graph Construction

Yuri Pirola; Marco Previtali; Raffaella Rizzi; Gianluca Della Vedova; Paola Bonizzoni;
Open Access English
  • Published: 31 May 2016
Abstract
Some recent results (Bauer et al. in Algorithms in bioinformatics, Springer, Berlin, pp 326---337, 2012; Cox et al. in Algorithms in bioinformatics, Springer, Berlin, pp. 214---224, 2012; Rosone and Sciortino in The nature of computation. Logic, algorithms, applications, Springer, Berlin, pp 353---364, 2013) have introduced external-memory algorithms to compute self-indexes of a set of strings, mainly via computing the Burrows---Wheeler transform of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to asse...
Subjects
free text keywords: Computer Science - Data Structures and Algorithms, Quantitative Biology - Genomics, Auxiliary memory, String graph, Burrows–Wheeler transform, Algorithm, Out-of-core algorithm, Data structure, Open problem, Theory of computation, Vertex (geometry), Computer science

1. Bankevich, A., Nurk, S., Antipov, D., et al.: SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455-477 (2012)

2. Bauer, M., Cox, A., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theor. Comput. Sci. 483, 134-148 (2013)

3. Bauer, M., Cox, A., Rosone, G., Sciortino, M.: Lightweight LCP construction for nextgeneration sequencing datasets. In: WABI. LNCS, vol. 7534, pp. 326-337 (2012)

4. Beretta, S., Bonizzoni, P., Della Vedova, G., Pirola, Y., Rizzi, R.: Modeling alternative splicing variants from RNA-Seq data with isoform graphs. J. Comput. Biol. 16(1), 16-40 (2014)

5. Cox, A., Jakobi, T., Rosone, G., Schulz-Trieglaff, O.: Comparing DNA sequence collections by direct comparison of compressed text indexes. In: WABI, LNCS, vol. 7534, pp. 214-224 (2012) [OpenAIRE]

6. Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707-730 (2012) [OpenAIRE]

7. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552-581 (2005) [OpenAIRE]

8. Lam, T., Li, R., Tam, A., Wong, S., Wu, E., Yiu, S.: High throughput short read alignment via bi-directional BWT. In: BIBM '09. pp. 31-36 (2009)

9. Myers, E.: The fragment assembly string graph. Bioinformatics 21, ii79-ii85 (2005)

10. Peng, Y., Leung, H., Yiu, S., Chin, F.: IDBA - a practical iterative de Bruijn graph de novo assembler. In: RECOMB. LNCS, vol. 6044, pp. 426-440. Springer (2010)

11. Salzberg, S.L., et al.: GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome research 22(3), 557-567 (2012)

12. Shi, F.: Suffix arrays for multiple strings: A method for on-line multiple string searches. In: Concurrency and Parallelism, Programming, Networking, and Security. LNCS, vol. 1179, pp. 11-22. Springer (1996)

13. Simpson, J., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367-i373 (2010) [OpenAIRE]

14. Simpson, J., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22, 549-556 (2012)

15. Simpson, J., Wong, K., Jackman, S., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117-1123 (2009)

Abstract
Some recent results (Bauer et al. in Algorithms in bioinformatics, Springer, Berlin, pp 326---337, 2012; Cox et al. in Algorithms in bioinformatics, Springer, Berlin, pp. 214---224, 2012; Rosone and Sciortino in The nature of computation. Logic, algorithms, applications, Springer, Berlin, pp 353---364, 2013) have introduced external-memory algorithms to compute self-indexes of a set of strings, mainly via computing the Burrows---Wheeler transform of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to asse...
Subjects
free text keywords: Computer Science - Data Structures and Algorithms, Quantitative Biology - Genomics, Auxiliary memory, String graph, Burrows–Wheeler transform, Algorithm, Out-of-core algorithm, Data structure, Open problem, Theory of computation, Vertex (geometry), Computer science

1. Bankevich, A., Nurk, S., Antipov, D., et al.: SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455-477 (2012)

2. Bauer, M., Cox, A., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theor. Comput. Sci. 483, 134-148 (2013)

3. Bauer, M., Cox, A., Rosone, G., Sciortino, M.: Lightweight LCP construction for nextgeneration sequencing datasets. In: WABI. LNCS, vol. 7534, pp. 326-337 (2012)

4. Beretta, S., Bonizzoni, P., Della Vedova, G., Pirola, Y., Rizzi, R.: Modeling alternative splicing variants from RNA-Seq data with isoform graphs. J. Comput. Biol. 16(1), 16-40 (2014)

5. Cox, A., Jakobi, T., Rosone, G., Schulz-Trieglaff, O.: Comparing DNA sequence collections by direct comparison of compressed text indexes. In: WABI, LNCS, vol. 7534, pp. 214-224 (2012) [OpenAIRE]

6. Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. Algorithmica 63(3), 707-730 (2012) [OpenAIRE]

7. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552-581 (2005) [OpenAIRE]

8. Lam, T., Li, R., Tam, A., Wong, S., Wu, E., Yiu, S.: High throughput short read alignment via bi-directional BWT. In: BIBM '09. pp. 31-36 (2009)

9. Myers, E.: The fragment assembly string graph. Bioinformatics 21, ii79-ii85 (2005)

10. Peng, Y., Leung, H., Yiu, S., Chin, F.: IDBA - a practical iterative de Bruijn graph de novo assembler. In: RECOMB. LNCS, vol. 6044, pp. 426-440. Springer (2010)

11. Salzberg, S.L., et al.: GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome research 22(3), 557-567 (2012)

12. Shi, F.: Suffix arrays for multiple strings: A method for on-line multiple string searches. In: Concurrency and Parallelism, Programming, Networking, and Security. LNCS, vol. 1179, pp. 11-22. Springer (1996)

13. Simpson, J., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367-i373 (2010) [OpenAIRE]

14. Simpson, J., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22, 549-556 (2012)

15. Simpson, J., Wong, K., Jackman, S., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117-1123 (2009)

Any information missing or wrong?Report an Issue