
Abstract Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering.
FOS: Computer and information sciences, 570, QH301-705.5, Bioinformatics, Computer applications to medicine. Medical informatics, Molecular Sequence Data, R858-859.7, Information Storage and Retrieval, Bioengineering--Data processing, Database, User-Computer Interface, Sequence Analysis, Protein, Computer Graphics, Amino Acid Sequence, Biology (General), Databases, Protein, Proteins, Data warehousing, 004, Proteins--Data processing, Database Management Systems, Protein engineering, Sequence Alignment
FOS: Computer and information sciences, 570, QH301-705.5, Bioinformatics, Computer applications to medicine. Medical informatics, Molecular Sequence Data, R858-859.7, Information Storage and Retrieval, Bioengineering--Data processing, Database, User-Computer Interface, Sequence Analysis, Protein, Computer Graphics, Amino Acid Sequence, Biology (General), Databases, Protein, Proteins, Data warehousing, 004, Proteins--Data processing, Database Management Systems, Protein engineering, Sequence Alignment
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 47 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
