A Semi-Supervised Autoencoder-Based Approach for Protein Function Prediction

descriptionPublicationkeyboard_double_arrow_right Article 01 Oct 2022Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Journal of Biomedical and Health Informatics, volume 26, pages 4,957-4,965 (issn: 2168-2194, eissn: 2168-2208,

Copyright policy )

Authors: Richa Dhanuka; Anushree Tripathi; Jyoti P. Singh;

doi: 10.1109/jbhi.2022.3163150

pmid: 35349463

A Semi-Supervised Autoencoder-Based Approach for Protein Function Prediction

- Summary
- Subjects
- Metrics

Abstract

After the development of next-generation sequencing techniques, protein sequences are abundantly available. Determining the functional characteristics of these proteins is costly and time-consuming. The gap between the number of protein sequences and their corresponding functions is continuously increasing. Advanced machine-learning methods have stepped up to fill this gap. In this work, an advanced deep-learning-based approach is proposed for protein function prediction using protein sequences. A set of autoencoders is trained in a semi-supervised manner with protein sequences. Each autoencoder corresponds to a single protein function only. In particular, 932 autoencoders corresponding to 932 biological processes and 585 autoencoders corresponding to 585 molecular functions are trained separately. Reconstruction losses of each protein sample for every autoencoder are used as a feature to classify these sequences into their corresponding functions. The proposed model is tested on test protein samples and achieves promising results. This method can be easily extended to predict any number of functions having an ample amount of supporting protein sequences. All relevant codes, data and trained models are available at https://github.com/richadhanuka/PFP-Autoencoders.

Related Organizations

National Institute of Technology Patna
India

Keywords

Machine Learning, Humans, Proteins

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	8
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%