• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 2 versions
Publication . Article . 2021

DeepSec: a deep learning framework for secreted protein discovery in human body fluids

Dan Shao; Lan Huang; Yan Wang; Kai He; Xueteng Cui; Yao Wang; Qin Ma; +1 Authors
Open Access  
Published: 01 Aug 2021 Journal: Bioinformatics, volume 38, pages 228-235 (issn: 1367-4803, eissn: 1460-2059, Copyright policy )
Publisher: Oxford University Press (OUP)
Abstract Motivation Human proteins that are secreted into different body fluids from various cells and tissues can be promising disease indicators. Modern proteomics research empowered by both qualitative and quantitative profiling techniques has made great progress in protein discovery in various human fluids. However, due to the large number of proteins and diverse modifications present in the fluids, as well as the existing technical limits of major proteomics platforms (e.g. mass spectrometry), large discrepancies are often generated from different experimental studies. As a result, a comprehensive proteomics landscape across major human fluids are not well determined. Results To bridge this gap, we have developed a deep learning framework, named DeepSec, to identify secreted proteins in 12 types of human body fluids. DeepSec adopts an end-to-end sequence-based approach, where a Convolutional Neural Network is built to learn the abstract sequence features followed by a Bidirectional Gated Recurrent Unit with fully connected layer for protein classification. DeepSec has demonstrated promising performances with average area under the ROC curves of 0.85–0.94 on testing datasets in each type of fluids, which outperforms existing state-of-the-art methods available mostly on blood proteins. As an illustration of how to apply DeepSec in biomarker discovery research, we conducted a case study on kidney cancer by using genomics data from the cancer genome atlas and have identified 104 possible marker proteins. Availability DeepSec is available at Supplementary information Supplementary data are available at Bioinformatics online.
Subjects by Vocabulary

Microsoft Academic Graph classification: Genomics Human proteins Computer science Biomarker discovery Profiling (information science) Cancer genome Computational biology Deep learning Convolutional neural network Artificial intelligence business.industry business Proteomics


Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications, Molecular Biology, Biochemistry, Statistics and Probability, Original Papers, Data and Text Mining, AcademicSubjects/SCI01060

Related Organizations