publication . Other literature type . Article . 2007

Quantifying similarity between motifs

John A. Stamatoyannopoulos; Shobhit Gupta; Timothy L. Bailey; William Stafford Noble;
Open Access
  • Published: 01 Feb 2007
  • Publisher: Springer Science and Business Media LLC
Abstract
A common question within the context of de novo motif discovery is whether a newly discovered, putative motif resembles any previously discovered motif in an existing database. To answer this question, we define a statistical measure of motif-motif similarity, and we describe an algorithm, called Tomtom, for searching a database of motifs with a given query motif. Experimental simulations demonstrate the accuracy of Tomtom's E values and its effectiveness in finding similar motifs.
Subjects
free text keywords: Method, Amino Acid Motifs, Sequence homology, Genetics, Biology, Computational biology, Motif (music), Eukaryotic Linear Motif resource
26 references, page 1 of 2

Maniatis, T, Goodbourn, S, Fischer, JA. Regulation of inducible and tissue-specific gene expression.. Science. 1987; 236: 1237-1245 [OpenAIRE] [PubMed] [DOI]

Pawson, T, Nash, P. Assembly of cell regulatory systems through protein interaction domains.. Science. 2003; 300: 445-452 [OpenAIRE] [PubMed] [DOI]

Tompa, M, Li, N, Bailey, T, Church, G, Moor, BD, Eskin, E, Favorov, A, Frith, M, Fu, Y, Kent, W. Assessing computational tools for the discovery of transcription factor binding sites.. Nat Biotechnol. 2005; 23: 137-144 [OpenAIRE] [PubMed] [DOI]

Sandelin, A, Alkema, W, Engstrom, P, Wasserman, W, Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles.. Nucliec Acids Res. 2004; 32: D91-D94 [OpenAIRE] [DOI]

Wingender, E, Chen, X, Hehl, R, Karas, H, Liebich, I, Matys, V, Meinhardt, T, Pruss, M, Reuter, I, Schacherer, F. TRANSFAC: an integrated system for gene expression regulation.. Nucleic Acids Res. 2000; 28: 316-319 [OpenAIRE] [PubMed] [DOI]

Henikoff, S, Henikoff, JG. Protein family classification based on searching a database of blocks.. Genomics. 1994; 19: 97-107 [OpenAIRE] [PubMed] [DOI]

Pietrokovski, S. Searching databases of conserved sequence regions by aligning protein multiple-alignments.. Nucleic Acids Res. 1996; 24: 3836-3845 [OpenAIRE] [PubMed] [DOI]

Hughes, JD, Estep, PW, Tavazoie, S, Church, GM. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol. 2000; 296: 1205-1214 [OpenAIRE] [PubMed] [DOI]

Wang, T, Stormo, GD. Combining phylogenetic data with co-regulated genes to to identify regulatory motifs.. Bioinformatics. 2003; 19: 2369-2380 [OpenAIRE] [PubMed] [DOI]

Schones, DE, Sumazin, P, Zhang, MQ. Similarity of position frequency matrices for transcription factor binding sites.. Bioinformatics. 2005; 21: 307-313 [OpenAIRE] [PubMed] [DOI]

Roepcke, S, Grossmann, S, Rahmann, S, Vingron, M. T-Reg Comparator: an analysis tool for the comparison of position weight matrices.. Nucleic Acids Res. 2005; 33: W438-W441 [OpenAIRE] [PubMed] [DOI]

Thijs, G, Marchal, K, Lescot, M, Rombauts, S, Moor, BD, Rouze, P, Moreau, Y. A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes.. J Comput Biol. 2002; 9: 447-464 [OpenAIRE] [PubMed] [DOI]

Aerts, S, Loo, PV, Thijs, G, Moreau, Y, Moor, BD. Computational detection of cis-regulatory modules.. Bioinformatics. 2003; 19: ii5-ii14 [OpenAIRE] [PubMed] [DOI]

Choi, I, Kwon, J, Kim, S. Local feature frequency profile: a method to measure structural similarity in proteins.. Proc Natl Acad Sci USA. 2004; 101: 3797-3802 [OpenAIRE] [PubMed] [DOI]

Sandelin, A, Wasserman, WW. Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics.. J Mol Biol. 2004; 338: 207-215 [OpenAIRE] [PubMed] [DOI]

26 references, page 1 of 2
Abstract
A common question within the context of de novo motif discovery is whether a newly discovered, putative motif resembles any previously discovered motif in an existing database. To answer this question, we define a statistical measure of motif-motif similarity, and we describe an algorithm, called Tomtom, for searching a database of motifs with a given query motif. Experimental simulations demonstrate the accuracy of Tomtom's E values and its effectiveness in finding similar motifs.
Subjects
free text keywords: Method, Amino Acid Motifs, Sequence homology, Genetics, Biology, Computational biology, Motif (music), Eukaryotic Linear Motif resource
26 references, page 1 of 2

Maniatis, T, Goodbourn, S, Fischer, JA. Regulation of inducible and tissue-specific gene expression.. Science. 1987; 236: 1237-1245 [OpenAIRE] [PubMed] [DOI]

Pawson, T, Nash, P. Assembly of cell regulatory systems through protein interaction domains.. Science. 2003; 300: 445-452 [OpenAIRE] [PubMed] [DOI]

Tompa, M, Li, N, Bailey, T, Church, G, Moor, BD, Eskin, E, Favorov, A, Frith, M, Fu, Y, Kent, W. Assessing computational tools for the discovery of transcription factor binding sites.. Nat Biotechnol. 2005; 23: 137-144 [OpenAIRE] [PubMed] [DOI]

Sandelin, A, Alkema, W, Engstrom, P, Wasserman, W, Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles.. Nucliec Acids Res. 2004; 32: D91-D94 [OpenAIRE] [DOI]

Wingender, E, Chen, X, Hehl, R, Karas, H, Liebich, I, Matys, V, Meinhardt, T, Pruss, M, Reuter, I, Schacherer, F. TRANSFAC: an integrated system for gene expression regulation.. Nucleic Acids Res. 2000; 28: 316-319 [OpenAIRE] [PubMed] [DOI]

Henikoff, S, Henikoff, JG. Protein family classification based on searching a database of blocks.. Genomics. 1994; 19: 97-107 [OpenAIRE] [PubMed] [DOI]

Pietrokovski, S. Searching databases of conserved sequence regions by aligning protein multiple-alignments.. Nucleic Acids Res. 1996; 24: 3836-3845 [OpenAIRE] [PubMed] [DOI]

Hughes, JD, Estep, PW, Tavazoie, S, Church, GM. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol. 2000; 296: 1205-1214 [OpenAIRE] [PubMed] [DOI]

Wang, T, Stormo, GD. Combining phylogenetic data with co-regulated genes to to identify regulatory motifs.. Bioinformatics. 2003; 19: 2369-2380 [OpenAIRE] [PubMed] [DOI]

Schones, DE, Sumazin, P, Zhang, MQ. Similarity of position frequency matrices for transcription factor binding sites.. Bioinformatics. 2005; 21: 307-313 [OpenAIRE] [PubMed] [DOI]

Roepcke, S, Grossmann, S, Rahmann, S, Vingron, M. T-Reg Comparator: an analysis tool for the comparison of position weight matrices.. Nucleic Acids Res. 2005; 33: W438-W441 [OpenAIRE] [PubMed] [DOI]

Thijs, G, Marchal, K, Lescot, M, Rombauts, S, Moor, BD, Rouze, P, Moreau, Y. A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes.. J Comput Biol. 2002; 9: 447-464 [OpenAIRE] [PubMed] [DOI]

Aerts, S, Loo, PV, Thijs, G, Moreau, Y, Moor, BD. Computational detection of cis-regulatory modules.. Bioinformatics. 2003; 19: ii5-ii14 [OpenAIRE] [PubMed] [DOI]

Choi, I, Kwon, J, Kim, S. Local feature frequency profile: a method to measure structural similarity in proteins.. Proc Natl Acad Sci USA. 2004; 101: 3797-3802 [OpenAIRE] [PubMed] [DOI]

Sandelin, A, Wasserman, WW. Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics.. J Mol Biol. 2004; 338: 207-215 [OpenAIRE] [PubMed] [DOI]

26 references, page 1 of 2
Any information missing or wrong?Report an Issue