publication . Article . Other literature type . 2015

The Dfam database of repetitive DNA families

Sean R. Eddy; Robert Hubley; Robert D. Finn; Weidong Bao; Travis J. Wheeler; Thomas A. Jones; Arian F.A. Smit; Jody Clements;
Open Access English
  • Published: 01 Nov 2015 Journal: Nucleic Acids Research, volume 44, issue Database issue, pages D81-D89 (issn: 0305-1048, eissn: 1362-4962, Copyright policy)
  • Publisher: Oxford University Press
  • Country: United States
Abstract
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of...
Subjects
free text keywords: Database Issue, Repeated sequence, Database, computer.software_genre, computer, Hidden Markov model, Molecular Sequence Annotation, Human genome, Multiple sequence alignment, Genome, Sequence alignment, Biology, Annotation
Funded by
WT
Project
  • Funder: Wellcome Trust (WT)
,
NIH| REPBASE UPDATE- A DATABASE OF REPETITIVE SEQUENCES
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 2P41LM006252-04A1
  • Funding stream: NATIONAL LIBRARY OF MEDICINE
,
NIH| Development and Maintenance of RepeatMasker
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 2R01HG002939-10
  • Funding stream: NATIONAL HUMAN GENOME RESEARCH INSTITUTE
30 references, page 1 of 2

Bao, Z., Eddy, S.R.. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res.. 2002; 12: 1269-1276 [OpenAIRE] [PubMed]

Price, A.L., Jones, N.C., Pevzner, P.A.. De novo identification of repeat families in large genomes. 2005; 21 (Suppl. 1): I351-I358

Flutre, T., Duprat, E., Feuillet, C., Quesneville, H.. Considering transposable element diversification in de novo annotation approaches. PLoS ONE. 2011; 6: e16526 [OpenAIRE] [PubMed]

Kohany, O., Gentles, A., Hankus, L., Jurka, J.. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006; 7: 474 [OpenAIRE] [PubMed]

Krogh, A., Searls, D, Kasif, S. An Introduction to Hidden Markov Models for Biological Sequences. Computational Methods in Molecular Biology. 1998: 45-63 [OpenAIRE]

Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. 1998

Gribskov, M., McLachlan, A.D., Eisenberg, D.. Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A.. 1987; 84: 4355-4358 [OpenAIRE] [PubMed]

Altschul, S., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W.. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res.. 1997; 25: 3389-3402 [OpenAIRE] [PubMed]

Karplus, K., Barrett, C., Hughey, R.. Hidden Markov models for detecting remote protein homologies. 1998; 14: 846-856 [OpenAIRE]

Eddy, S.R.. A new generation of homology search tools based on probabilistic inference. Genome Inform.. 2009; 23: 205-211 [OpenAIRE] [PubMed]

Wheeler, T.J., Eddy, S.R.. nhmmer: DNA homology search with profile HMMs. Bioinformatics. 2013; 29: 2487-2489 [OpenAIRE] [PubMed]

Bao, W., Kojima, K.K., Kohany, O.. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 2015; 6: 11 [OpenAIRE] [PubMed]

Federhen, S.. Type material in the NCBI Taxonomy Database. Nucleic Acids Res.. 2015; 43: D1086-D1098 [OpenAIRE] [PubMed]

Paten, B., Earl, D., Nguyen, N., Diekhans, M., Zerbino, D., Haussler, D.. Cactus: Algorithms for genome multiple sequence alignment. Genome Res.. 2011; 21: 1512-1528 [OpenAIRE] [PubMed]

Hickey, G., Paten, B., Earl, D., Zerbino, D., Haussler, D.. HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics. 2013; 29: 1341-1342 [OpenAIRE] [PubMed]

30 references, page 1 of 2
Abstract
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of...
Subjects
free text keywords: Database Issue, Repeated sequence, Database, computer.software_genre, computer, Hidden Markov model, Molecular Sequence Annotation, Human genome, Multiple sequence alignment, Genome, Sequence alignment, Biology, Annotation
Funded by
WT
Project
  • Funder: Wellcome Trust (WT)
,
NIH| REPBASE UPDATE- A DATABASE OF REPETITIVE SEQUENCES
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 2P41LM006252-04A1
  • Funding stream: NATIONAL LIBRARY OF MEDICINE
,
NIH| Development and Maintenance of RepeatMasker
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 2R01HG002939-10
  • Funding stream: NATIONAL HUMAN GENOME RESEARCH INSTITUTE
30 references, page 1 of 2

Bao, Z., Eddy, S.R.. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res.. 2002; 12: 1269-1276 [OpenAIRE] [PubMed]

Price, A.L., Jones, N.C., Pevzner, P.A.. De novo identification of repeat families in large genomes. 2005; 21 (Suppl. 1): I351-I358

Flutre, T., Duprat, E., Feuillet, C., Quesneville, H.. Considering transposable element diversification in de novo annotation approaches. PLoS ONE. 2011; 6: e16526 [OpenAIRE] [PubMed]

Kohany, O., Gentles, A., Hankus, L., Jurka, J.. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006; 7: 474 [OpenAIRE] [PubMed]

Krogh, A., Searls, D, Kasif, S. An Introduction to Hidden Markov Models for Biological Sequences. Computational Methods in Molecular Biology. 1998: 45-63 [OpenAIRE]

Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. 1998

Gribskov, M., McLachlan, A.D., Eisenberg, D.. Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A.. 1987; 84: 4355-4358 [OpenAIRE] [PubMed]

Altschul, S., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W.. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res.. 1997; 25: 3389-3402 [OpenAIRE] [PubMed]

Karplus, K., Barrett, C., Hughey, R.. Hidden Markov models for detecting remote protein homologies. 1998; 14: 846-856 [OpenAIRE]

Eddy, S.R.. A new generation of homology search tools based on probabilistic inference. Genome Inform.. 2009; 23: 205-211 [OpenAIRE] [PubMed]

Wheeler, T.J., Eddy, S.R.. nhmmer: DNA homology search with profile HMMs. Bioinformatics. 2013; 29: 2487-2489 [OpenAIRE] [PubMed]

Bao, W., Kojima, K.K., Kohany, O.. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 2015; 6: 11 [OpenAIRE] [PubMed]

Federhen, S.. Type material in the NCBI Taxonomy Database. Nucleic Acids Res.. 2015; 43: D1086-D1098 [OpenAIRE] [PubMed]

Paten, B., Earl, D., Nguyen, N., Diekhans, M., Zerbino, D., Haussler, D.. Cactus: Algorithms for genome multiple sequence alignment. Genome Res.. 2011; 21: 1512-1528 [OpenAIRE] [PubMed]

Hickey, G., Paten, B., Earl, D., Zerbino, D., Haussler, D.. HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics. 2013; 29: 1341-1342 [OpenAIRE] [PubMed]

30 references, page 1 of 2
Any information missing or wrong?Report an Issue