A distance-based block searching algorithm.

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jan 1995 English Journal:Proceedings. International Conference on Intelligent Systems for Molecular Biology, volume 3 (issn: 1553-0833,

Copyright policy )

Authors: Sagot, M-F.; Viari, Alain; Soldano, H.;

pmid: 7584455

A distance-based block searching algorithm.

- Summary
- Subjects
- Metrics

Abstract

We present in this paper an algorithm for the multiple comparison of a set of protein sequences. Our approach is that of peptide matching and consists in looking for all the words that occur approximatively in at least q of the sequences in the set, where q is a parameter. Words are compared by using a reference object called a model, that is itself a word over the alphabet of the amino acids, and the comparison between a model and a word is based on w-length words instead of single symbols. This idea is similar to the one used in the Blast program in the case of pairwise comparisons. Two w-length words are considered to be related if an alignment without gaps of the two using a similarity matrix has a score greater than a certain threshold value t. In our case, we say that a k-length word u is an occurrence of a model m of the same length if every w-length subword of u is related to the corresponding subword of m in the sense given above. If a model m has occurrences in at least q of the sequences of the set, m is said to occur in the set. In percentage terms, the value of q may correspond to something as small as 5% of the sequences (search for recurrent words in a set of non homologous proteins) or as high as 70-100% (establishment of a list of all similar words as a first step in a multiple alignment program). The algorithm presented here is an efficient and exact way of looking for all the models, of a fixed length k or of the greatest possible length kmax, that occur in a set of sequences. It can work with any kind of scoring matrix and an extension of the algorithm allows for the introduction of gaps between a model and its occurrences.

Related Organizations

French National Centre for Scientific Research
France
University of Lyon System
France
Biometry and Evolutionary Biology Laboratory
France
Claude Bernard University Lyon 1
France

Keywords

Sequence Homology, Amino Acid, [SDV.OT] Life Sciences [q-bio]/Other [q-bio.OT], Molecular Sequence Data, Proteins, Models, Theoretical, Animals, Humans, Computer Simulation, Amino Acid Sequence, Algorithms, Software

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Fields of Science (3) View all

medical and health sciences

basic medicine

Fields of Science

medical and health sciences

basic medicine

View all

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now