Simplifying amino acid alphabets by means of a  branch and bound algorithm and substitution matrices

descriptionPublicationkeyboard_double_arrow_right Article 01 Aug 2002 English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 18, pages 1,102-1,108 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )

Authors: CANNATA N; TOPPO, STEFANO; ROMUALDI, CHIARA; VALLE, GIORGIO;

doi: 10.1093/bioinformatics/18.8.1102

pmid: 12176833

handle: 11577/2429121 , 11581/201339

Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation: Protein and DNA are generally represented by sequences of letters. In a number of circumstances simplified alphabets (where one or more letters would be represented by the same symbol) have proved their potential utility in several fields of bioinformatics including searching for patterns occurring at an unexpected rate, studying protein folding and finding consensus sequences in multiple alignments. The main issue addressed in this paper is the possibility of finding a general approach that would allow an exhaustive analysis of all the possible simplified alphabets, using substitution matrices like PAM and BLOSUM as a measure for scoring. Results: The computational approach presented in this paper has led to a computer program called AlphaSimp (Alphabet Simplifier) that can perform an exhaustive analysis of the possible simplified amino acid alphabets, using a branch and bound algorithm together with standard or user-defined substitution matrices. The program returns a ranked list of the highest-scoring simplified alphabets. When the extent of the simplification is limited and the simplified alphabets are maintained above ten symbols the program is able to complete the analysis in minutes or even seconds on a personal computer. However, the performance becomes worse, taking up to several hours, for highly simplified alphabets. Availability: AlphaSimp and other accessory programs are available at http://bioinformatics.cribi.unipd.it/alphasimp Contact: giorgio.valle@unipd.it

Related Organizations

University of Padua
Italy
University of Camerino
Italy

Keywords

Internet, Models, Statistical, Sequence Homology, Amino Acid, Decision Trees, Molecular Sequence Data, Information Storage and Retrieval, Sensitivity and Specificity, Evaluation Studies as Topic, Sequence Analysis, Protein, Software Design, Database Management Systems, Programming Languages, Amino Acid Sequence, Databases, Protein, Algorithms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	34
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

34

Top 10%

gold

Fields of Science (3) View all

medical and health sciences

basic medicine

Fields of Science

medical and health sciences

basic medicine

View all