
doi: 10.1109/icit.2017.55
Remote homology detection at amino acid level is a complex problem in bio-informatics. Similarity of sequences is very negligible and inconsequential when it is compared with their clan based resemblance. Thus customary detection methods may be replaced by modern Support Vector Machine (SVM) based approaches where a sequence is represented by significant feature vectors. This approach tends to give superior accuracy over the profile based approaches. In this work, effort has been made in two directions. In the first approach, 2-mers are generated from individual amino acid for protein sequences and various physicochemical parameters are used to generate the feature vector. In the second approach, the properties of amino acid are used to create the feature vectors using 3-mers in the similar manner. After the feature vectors were generated from two approaches, PCA (Principle Component analysis) was used for dimensionality reduction and SVM is used for classification. The accuracy of the proposed methods is compared to other existing methods.It was observed that the approach performed similar for some protein families and better for some other group of families. This work also proposes that the physicochemical parameters can be used to detect remote homology in protein sequences.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
