Comparison Of Popular Bioinformatics Databases

{"references": ["1.\tRamsden Jeremy J, Bioinformatics: An Introduction 2nd edition (Springer-Verlag Limited, London), 2009. 2.\tDuck G, Nenadic G, Brass A, Robertson DL, Stevens R. Extracting Patterns of Database and Software usage from the Bioinformatics Literature. BMC Bioinformatics. 2014 Aug; 30(17):i601\u2013i608. doi: 10.1093/bioinformatics/btu471 PMID: 25161253. 3.\tBabu, P. A., Boddepalli, R., Lakshmi, V. V., & Rao, G. N. (2005). Dod: Database of databases\u2013updated molecular biology databases. In silico biology, 5(5, 6), 605-610. 4.\tDuck, G., Nenadic, G., Brass, A., Robertson, D. L., & Stevens, R. (2013). bioNerDS: exploring bioinformatics\u2019 database and software use through literature mining. BMC bioinformatics, 14(1), 1.doi: 10.1186/1471- 2105-14-194 PMID: 23768135. 5.\tDuck G, Nenadic G, Brass A, Robertson DL, Stevens R. bioNerDS: Exploring Bioinformatics\u2019 Database and Software use through Literature Mining. BMC Bioinformatics. 2013; 14(1):194. doi: 10.1186/1471- 2105-14-194 PMID: 23768135. 6.\tK\u00f6hler, Jacob. \"Integration of life science databases.\" Drug Discovery Today: BIOSILICO 2.2 (2004): 61-69. 7.\tBabu, P. A., Udyama, J., Kumar, R. K., Boddepalli, R., Mangala, D. S., & Rao, G. N. (2007). DoD2007: 1082 molecular biology databases. Bioinformation, 2(2), 64-67.Available from: http://www.ncbi.nlm.nih.gov/pmc/.doi: 10.6026/97320630002064. 8.\tGalperin MY, Cochrane GR. The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucleic Acids Research. 2011 dec; 39(Database issue):D1\u2013D6. doi: 10.1093/nar/gkq1243 PMID: 21177655. 9.\tDiscala C, Benigni X, Barillot E, Vaysseix G. DBcat: A Catalog of 500 Biological Databases. Nucleic Acids Research. 2000 Jan; 28(1):8\u20139. doi: 10.1093/nar/28.1.8 PMID: 10592168. 10.\tFox, J. A., Butland, S. L., McMillan, S., Campbell, G., & Ouellette, B. F. (2005). The Bioinformatics Links Directory: a compilation of molecular biology web servers. Nucleic acids research, 33(suppl 2), W3-W24.doi: 10.1093/nar/gki594 PMID: 15980476 11.\tEales, J. M., Pinney, J. W., Stevens, R. D., & Robertson, D. L. (2008). Methodology capture: discriminating between the\" best\" and the rest of community practice. BMC bioinformatics, 9(1), 1.doi: 10.1186/1471- 2105-9-359 PMID: 18761740. 12.\tDuck G, Nenadic G, Brass A, Robertson DL, Stevens R. bioNerDS: Exploring Bioinformatics\u2019 Database and Software use through Literature Mining. BMC Bioinformatics. 2013; 14(1):1. doi: 10.1186/1471- 2105-14-194 PMID: 23768135. 13.\tBenson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2013). GenBank. Nucleic acids research, 41(D1), D36-D42. 14.\tNational Center for Biotechnology Information (NCBI). GenBank Release Notes 213.0.http://www.ncbi.nlm.nih.gov/genbank/release/213/. Accessed on 20th July 19, 2016. 15.\tHertz\u2010Fowler C, Peacock CS, Wood V, Aslett M, Kerhornou A, Mooney P, Tivey A, Berriman M, Hall N, Rutherford K, Parkhill J. (2004). GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic acids research, 32(suppl 1), D339-D343. 16.\tFile Transfer Protocol (FTP) site for GenBank Nucleotide Sequence. ftp://ftp.ncbi.nih.gov/genbank/. Accessed on 20th July, 2016. 17.\tBairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ. (2005). The universal protein resource (UniProt). Nucleic acids research, 33(suppl 1), D154-D159. 18.\tO'Donovan, C., Martin, M. J., Gattiker, A., Gasteiger, E., Bairoch, A., & Apweiler, R. (2002). High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Briefings in bioinformatics, 3(3), 275-284. 19.\tSuzek, B. E., Huang, H., McGarvey, P., Mazumder, R., & Wu, C. H. (2007). UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics, 23(10), 1282-1288. doi:10.1093/bioinformatics/btm098. 20.\tOxford University Press. UniProt: A Hub for Protein Information. ,\u201d Nucleic Acids Research, 2014. doi:10.1093/nar/gkq989. 21.\tBhat TN, Bourne P, Feng Z, Gilliland G, Jain S, Ravichandran V, Schneider B, Schneider K, Thanki N, Weissig H, Westbrook J. (2001). The PDB data uniformity project. Nucleic Acids Research, 29(1), 214-218. 22.\tDeshpande N, Addess KJ, Bluhm WF, Merino-Ott JC, Townsend-Merino W, Zhang Q, Knezevich C, Xie L, Chen L, Feng Z, Green RK. (2005). The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic acids research, 33(suppl 1), D233-D237. 23.\tM. Kanehisa and S. Goto, KEGG: Kyoto Encyclopedia of Genes and Genomes, Nuc. Acids Res., 28(1): 27\u201330, 2000. 24.\tAshburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA. (2000). Gene Ontology: tool for the unification of biology. Nature genetics, 25(1), 25-29. 25.\tWheeler DL, Church DM, Edgar R, Federhen S, Helmberg W, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Suzek TO. (2004). Database resources of the National Center for Biotechnology Information: update. Nucleic acids research, 32(suppl 1), D35-D40. 26.\tRCSB Protein Data Bank. http://www.rcsb.org/pdb/home/home.do. Accessed 19th July 19, 2016. 27.\tBerman, H. M. (2008). The protein data bank: a historical perspective. Acta Crystallographica Section A: Foundations of Crystallography, 64(1), 88-95. doi:10.1107/S0108767307035623. 28.\tC. H. Wu, L. S. Yeh, H. Huang, L. Arminski, J. Castro-Alvear, Y. Chen, Z. Z. Hu, R. S. Ledley, P. Kourtesis, B. E. Suzek, C. R. Vinayaka, J. Zhang, W. C. Barker, The Protein Information Resource, Nuc. Acids Res., 31: 345\u2013347, 2003. 29.\tWu CH, Nikolskaya A, Huang H, Yeh LS, Natale DA, Vinayaka CR, Hu ZZ, Mazumder R, Kumar S, Kourtesis P, Ledley RS.. (2004). PIRSF: family classification system at the Protein Information Resource. Nucleic acids research, 32(suppl 1), D112-D114. 30.\tHu, Z. Z., Mani, I., Hermoso, V., Liu, H., & Wu, C. H. (2004). iProLINK: an integrated protein resource for literature mining. Computational biology and chemistry, 28(5), 409-416 31.\tCochrane G, Aldebert P, Althorpe N, Andersson M, Baker W, Baldwin A, Bates K, Bhattacharyya S, Browne P, van den Broek A, Castro M. M. (2006). EMBL nucleotide sequence database: developments in 2005. Nucleic acids research, 34(suppl 1), D10-D15. 32.\tJ. T. L. Wang, C. H. Wu, and P. P. Wang, Computational Biology and Genome Informatics, Singapore: World Scientific Publishing, 2003. 33.\tUniProtKB/Swiss-Prot Release Statistics. http://web.expasy.org/docs/relnotes/relstat.html. Accessed on 20th July 2016. 34.\tBrooksbank, C., Bergman, M. T., Apweiler, R., Birney, E., & Thornton, J. (2014). The european bioinformatics institute\u2019s data resources 2014. Nucleic acids research, 42(D1), D18-D25. 35.\tGibson, R., Alako, B., Amid, C., Cerde\u00f1o-T\u00e1rraga, A., Cleland, I., Goodgame, N., ten Hoopen, P., Jayathilaka, S., Kay, S., Leinonen, R. and Liu, X., 2016. Biocuration of functional annotation at the European nucleotide archive. Nucleic acids research, 44(D1), pp.D58-D66. Doi:10.1093/nar/gkv1311. 36.\tK. Okubo, H. Sugawara, T. Gojobori, and Y. Tateno, DDBJ in Preparation for Overview of Research Aactivities behind Data Submissions Nuc. Acids Res., 34(1): D6\u2013D9, 2006. 37.\tOrengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B., & Thornton, J. M. (1997). CATH\u2013a hierarchic classification of protein domain structures. Structure, 5(8), 1093-1109. doi:10.1016/S0969-2126(97)00260-8. 38.\tCuff, A.L., Sillitoe, I., Lewis, T., Clegg, A.B., Rentzsch, R., Furnham, N., Pellegrini-Calace, M., Jones, D., Thornton, J. and Orengo, C.A., 2011. Extending CATH: increasing coverage of the protein structure universe and linking structure with function. Nucleic acids research, 39(suppl 1), pp.D420-D426. doi:10.1093/nar/gkq1001. 39.\tAndreeva, D. Howorth, S. E. Brenner, T. J. Hubbard, C. Chothia and A. G. Murzin, \u201cSCOP Database in 2004: Refinements Integrate Structure and Sequence Family Data,\u201d Nucleic Acids Research, 32(suppl 1), 2004, pp. D226-D229. doi:10.1093/nar/gkh039."]}

Bioinformatics is the application of computational tools to capture and interpret biological data. It has wide applications in drug development, crop improvement, agricultural biotechnology and forensic DNA analysis. There are various databases available to researchers in bioinformatics. These databases are customized for a specific need and are ranged in size, scope, and purpose. The main drawbacks of bioinformatics databases include redundant information, constant change, data spread over multiple databases, incomplete information, several errors, and sometimes incorrect links. Also, standard database, naming conventions, and nomenclature are not clearly defined for many aspects of biological information. Hence, these make information extraction more difficult. In this paper, most widely used bioinformatics databases are presented. These databases are notable for their level of redundancy and annotation, structure coverage and accessibility. They are GenBank, Protein Information Resource (PIR), DNA Data Bank of Japan (DDBJ), European Molecular Biology Laboratory (EMBL), Protein Data Bank (PDB), Universal Protein Resource (UniProt), Swiss-Prot, Structural Classification of Protein (SCOP) and Class Architecture Topology Homology (CATH) databases. The key features of the databases are demonstrated and detailed comparisons of the databases were made based on primary and secondary form of databases, and their uniqueness were also highlighted. The databases are foundation stones of bioinformatics and are useful for performing a rigorous benchmarking.

Related Organizations

Sultan Zainal Abidin University
Malaysia
National Biotechnology Development Agency
Nigeria

Keywords

Bioinformatics, Databases & Information Technology, Bioinformatics, Databases & Information Technology

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average