SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data

descriptionPublicationkeyboard_double_arrow_right Article 11 Oct 2023 United States English Publisher:Oxford University Press (OUP)Journal:NAR Genomics and Bioinformatics, volume 5 (eissn: 2631-9268,

Copyright policy )

Authors: Di Xiao; Michael Lin; Chunlei Liu; Thomas A Geddes; James G Burchfield; Benjamin L Parker; Sean J Humphrey; +1 Authors

doi: 10.1093/nargab/lqad099 , 10.5281/zenodo.10038862 , 10.5281/zenodo.10038861

pmid: 37954574

pmc: PMC10632189

SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data

- Summary
- Subjects
- Metrics

Abstract

Abstract A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a ‘pseudo-positive’ learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model (‘SnapKin’) by incorporating the above two learning strategies into a ‘snapshot’ ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin.

Country

United States

Related Organizations

California Digital Library
United States
University of Sydney
Australia
University of California
United States
Children's Medical Research Institute
Australia
Murdoch Children's Research Institute
Australia

View all View all

Keywords

Methods Article

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

gold