
pmid: 16873517
Abstract Motivation: The classification of proteins expressed by an organism is an important step in understanding the molecular biology of that organism. Traditionally, this classification has been performed by human experts. Human knowledge can recognise the functional properties that are sufficient to place an individual gene product into a particular protein family group. Automation of this task usually fails to meet the ‘gold standard’ of the human annotator because of the difficult recognition stage. The growing number of genomes, the rapid changes in knowledge and the central role of classification in the annotation process, however, motivates the need to automate this process. Results: We capture human understanding of how to recognise members of the protein phosphatases family by domain architecture as an ontology. By describing protein instances in terms of the domains they contain, it is possible to use description logic reasoners and our ontology to assign those proteins to a protein family class. We have tested our system on classifying the protein phosphatases of the human and Aspergillus fumigatus genomes and found that our knowledge-based, automatic classification matches, and sometimes surpasses, that of the human annotators. We have made the classification process fast and reproducible and, where appropriate knowledge is available, the method can potentially be generalised for use with any protein family. Availability: All components described in this paper are freely available. OWL ontology myGrid Instance Store Contact: KWolstencroft@cs.man.ac.uk
Structure-Activity Relationship, Sequence Analysis, Protein, Molecular Sequence Data, Proteins, Expert Systems, Amino Acid Sequence, Sequence Alignment, Algorithms, Pattern Recognition, Automated
Structure-Activity Relationship, Sequence Analysis, Protein, Molecular Sequence Data, Proteins, Expert Systems, Amino Acid Sequence, Sequence Alignment, Algorithms, Pattern Recognition, Automated
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 41 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
