Towards index-based similarity search for protein structure databases

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 23 Mar 2004Publisher:IEEE Comput. SocJournal:Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003

Authors: Orhan Çamoglu; Tamer Kahveci; Ambuj K. Singh;

doi: 10.1109/csb.2003.1227314

pmid: 16452789

Towards index-based similarity search for protein structure databases

- Summary
- Subjects
- Metrics

Abstract

We propose two methods for finding similarities in protein structure databases. Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements) of proteins. These feature vectors are then indexed using a multidimensional index structure. Our first technique considers the problem of finding proteins similar to a given query protein in a protein dataset. This technique quickly finds promising proteins using the index structure. These proteins are then aligned to the query protein using a popular pairwise alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to find an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while keeping the sensitivity similar.

Related Organizations

University of California, Santa Barbara
United States

Keywords

Molecular Sequence Data, Information Storage and Retrieval, Proteins, Pattern Recognition, Automated, Artificial Intelligence, Sequence Analysis, Protein, Database Management Systems, Amino Acid Sequence, Databases, Protein, Sequence Alignment, Algorithms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	11
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%