Towards automatic clustering of protein sequences

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 25 Jun 2003Publisher:IEEE Comput. SocJournal:Proceedings. IEEE Computer Society Bioinformatics Conference

Authors: Jiong Yang 0001; Wei Wang 0010;

doi: 10.1109/csb.2002.1039340

pmid: 15838134

Towards automatic clustering of protein sequences

- Summary
- Subjects
- Metrics

Abstract

Analyzing protein sequence data becomes increasingly important recently. Most previous work on this area has mainly focused on building classification models. In this paper, we investigate in the problem of automatic clustering of unlabeled protein sequences. As a widely recognized technique in statistics and computer science, clustering has been proven very useful in detecting unknown object categories and revealing hidden correlations among objects. One difficulty that prevents clustering from being performed directly on protein sequence is the lack of an effective similarity measure that can be computed efficiently. Therefore, we propose a novel model for protein sequence cluster by exploring significant statistical properties possessed by the sequences. The concept of imprecise probabilities are introduced to the original probabilistic suffix tree to monitor the convergence of the empirical measurement and to guide the clustering process. It has been demonstrated that the proposed method can successfully discover meaningful families without the necessity of learning models of different families from pre-labeled "training data".

Related Organizations

North Carolina Agricultural and Technical State University
United States
IBM (United States)
United States
IBM Research – Thomas J. Watson Research Center
United States

Keywords

Sequence Homology, Amino Acid, Artificial Intelligence, Sequence Analysis, Protein, Molecular Sequence Data, Cluster Analysis, Proteins, Amino Acid Sequence, Sequence Alignment, Algorithms, Pattern Recognition, Automated

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average