publication . Article . Other literature type . 2017

A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits

Kang K. Yan; Hongyu Zhao; Herbert Pang;
Open Access English
  • Published: 01 Dec 2017 Journal: BMC Bioinformatics, volume 18 (eissn: 1471-2105, Copyright policy)
  • Publisher: BioMed Central
  • Country: China (People's Republic of)
Abstract
Background High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. Results In this paper, we focus on two common classes of integration algorithms, graph-base...
Subjects
free text keywords: Research Article, Bayesian network, Relevance vector machine, Graph-based semi-supervised learning, Semi-definite programming (SDP)-support vector machine, Multiple data sources, Classification, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5, Biochemistry, Applied Mathematics, Molecular Biology, Structural Biology, Computer Science Applications, Kernel (linear algebra), Feature vector, Algorithm, Classifier (linguistics), Data set, Data integration, computer.software_genre, computer, Support vector machine, Biology
Related Organizations
Funded by
NIH| Characterization of Predictive Biomarkers for the Clinical Efficacy of PHY906
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5P01CA154295-02
  • Funding stream: NATIONAL CANCER INSTITUTE
,
NIH| Identifying T2D Variants by DNA Sequencing in Multiethnic Samples
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 1U01DK085584-01
  • Funding stream: NATIONAL INSTITUTE OF DIABETES AND DIGESTIVE AND KIDNEY DISEASES
,
NIH| Identification and Replication of Type 2 Diabetes Genes in Mexican Americans
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5U01DK085501-02
  • Funding stream: NATIONAL INSTITUTE OF DIABETES AND DIGESTIVE AND KIDNEY DISEASES
,
NIH| GWAS for Sleep Apnea and Endothelial Function Among Mexican Americans
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 1R01HL102830-01
  • Funding stream: NATIONAL HEART, LUNG, AND BLOOD INSTITUTE
,
NIH| Statistical Methods to Map Genes for Complex Traits
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5R01GM059507-08
  • Funding stream: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
24 references, page 1 of 2

Gonzalez, GH, Tahsin, T, Goodale, BC, Greene, AC, Greene, CS. Recent advances and emerging applications in text and data mining for biomedical discovery. Brief Bioinform. 2016; 17 (1): 33-42 [OpenAIRE] [PubMed] [DOI]

Taskesen, E, Babaei, S, Reinders, MM, de Ridder, J. Integration of gene expression and DNA-methylation profiles improves molecular subtype classification in acute myeloid leukemia. BMC Bioinf. 2015; 16 (Suppl 4): S5 [DOI]

Ma, X, Liu, Z, Zhang, Z, Huang, X, Tang, W. Multiple network algorithm for epigenetic modules via the integration of genome-wide DNA methylation and gene expression data. BMC Bioinf. 2017; 18 (1): 72 [OpenAIRE] [DOI]

Costello, JC, Heiser, LM, Georgii, E, Gonen, M, Menden, MP, Wang, NJ, Bansal, M, Ammad-ud-din, M, Hintsanen, P, Khan, SA. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014; 32 (12): 1202-1212 [OpenAIRE] [PubMed] [DOI]

Tsuda, K, Shin, H, Scholkopf, B. Fast protein classification with multiple networks. Bioinformatics. 2005; 21 (Suppl 2): ii59-ii65 [OpenAIRE] [PubMed] [DOI]

Shin, H, Lisewski, AM, Lichtarge, O. Graph sharpening plus graph integration: a synergy that improves protein functional classification. Bioinformatics. 2007; 23 (23): 3217-3224 [OpenAIRE] [PubMed] [DOI]

Mostafavi, S, Ray, D, Warde-Farley, D, Grouios, C, Morris, Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008; 9 (Suppl 1): S4 [OpenAIRE] [PubMed] [DOI]

Mostafavi, S, Morris, Q. Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics. 2010; 26 (14): 1759-1765 [OpenAIRE] [PubMed] [DOI]

Rhodes, DR, Tomlins, SA, Varambally, S, Mahavisno, V, Barrette, T, Kalyana-Sundaram, S, Ghosh, D, Pandey, A, Chinnaiyan, AM. Probabilistic model of the human protein-protein interaction network. Nat Biotechnol. 2005; 23 (8): 951-959 [OpenAIRE] [PubMed] [DOI]

Lanckriet, GR, De Bie, T, Cristianini, N, Jordan, MI, Noble, WS. A statistical framework for genomic data fusion. Bioinformatics. 2004; 20 (16): 2626-2635 [OpenAIRE] [PubMed] [DOI]

Lanckriet, GRG, Cristianini, N, Bartlett, P, El Ghaoui, L, Jordan, MI. Learning the kernel matrix with semidefinite programming. J Mach Learn Res. 2004; 5: 27-72

Tipping, ME. Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res. 2001; 1 (3): 211-244

Tipping, ME, Faul, AC. Fast marginal likelihood maximisation for sparse Bayesian models. AISTATS. 2003 [OpenAIRE]

CC, W, Asgharzadeh, S, Triche, TJ, D’Argenio, DZ. Prediction of human functional genetic networks from heterogeneous data using RVM-based ensemble learning. Bioinformatics. 2010; 26 (6): 807-813 [OpenAIRE] [PubMed] [DOI]

Zhou, D, Bousquet, O, Lal, TN, Weston, J, Schölkopf, B. Learning with local and global consistency. Adv Neural Inf Proces Syst. 2004; 16 (16): 321-328

24 references, page 1 of 2
Abstract
Background High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. Results In this paper, we focus on two common classes of integration algorithms, graph-base...
Subjects
free text keywords: Research Article, Bayesian network, Relevance vector machine, Graph-based semi-supervised learning, Semi-definite programming (SDP)-support vector machine, Multiple data sources, Classification, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5, Biochemistry, Applied Mathematics, Molecular Biology, Structural Biology, Computer Science Applications, Kernel (linear algebra), Feature vector, Algorithm, Classifier (linguistics), Data set, Data integration, computer.software_genre, computer, Support vector machine, Biology
Related Organizations
Funded by
NIH| Characterization of Predictive Biomarkers for the Clinical Efficacy of PHY906
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5P01CA154295-02
  • Funding stream: NATIONAL CANCER INSTITUTE
,
NIH| Identifying T2D Variants by DNA Sequencing in Multiethnic Samples
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 1U01DK085584-01
  • Funding stream: NATIONAL INSTITUTE OF DIABETES AND DIGESTIVE AND KIDNEY DISEASES
,
NIH| Identification and Replication of Type 2 Diabetes Genes in Mexican Americans
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5U01DK085501-02
  • Funding stream: NATIONAL INSTITUTE OF DIABETES AND DIGESTIVE AND KIDNEY DISEASES
,
NIH| GWAS for Sleep Apnea and Endothelial Function Among Mexican Americans
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 1R01HL102830-01
  • Funding stream: NATIONAL HEART, LUNG, AND BLOOD INSTITUTE
,
NIH| Statistical Methods to Map Genes for Complex Traits
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5R01GM059507-08
  • Funding stream: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
24 references, page 1 of 2

Gonzalez, GH, Tahsin, T, Goodale, BC, Greene, AC, Greene, CS. Recent advances and emerging applications in text and data mining for biomedical discovery. Brief Bioinform. 2016; 17 (1): 33-42 [OpenAIRE] [PubMed] [DOI]

Taskesen, E, Babaei, S, Reinders, MM, de Ridder, J. Integration of gene expression and DNA-methylation profiles improves molecular subtype classification in acute myeloid leukemia. BMC Bioinf. 2015; 16 (Suppl 4): S5 [DOI]

Ma, X, Liu, Z, Zhang, Z, Huang, X, Tang, W. Multiple network algorithm for epigenetic modules via the integration of genome-wide DNA methylation and gene expression data. BMC Bioinf. 2017; 18 (1): 72 [OpenAIRE] [DOI]

Costello, JC, Heiser, LM, Georgii, E, Gonen, M, Menden, MP, Wang, NJ, Bansal, M, Ammad-ud-din, M, Hintsanen, P, Khan, SA. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014; 32 (12): 1202-1212 [OpenAIRE] [PubMed] [DOI]

Tsuda, K, Shin, H, Scholkopf, B. Fast protein classification with multiple networks. Bioinformatics. 2005; 21 (Suppl 2): ii59-ii65 [OpenAIRE] [PubMed] [DOI]

Shin, H, Lisewski, AM, Lichtarge, O. Graph sharpening plus graph integration: a synergy that improves protein functional classification. Bioinformatics. 2007; 23 (23): 3217-3224 [OpenAIRE] [PubMed] [DOI]

Mostafavi, S, Ray, D, Warde-Farley, D, Grouios, C, Morris, Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008; 9 (Suppl 1): S4 [OpenAIRE] [PubMed] [DOI]

Mostafavi, S, Morris, Q. Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics. 2010; 26 (14): 1759-1765 [OpenAIRE] [PubMed] [DOI]

Rhodes, DR, Tomlins, SA, Varambally, S, Mahavisno, V, Barrette, T, Kalyana-Sundaram, S, Ghosh, D, Pandey, A, Chinnaiyan, AM. Probabilistic model of the human protein-protein interaction network. Nat Biotechnol. 2005; 23 (8): 951-959 [OpenAIRE] [PubMed] [DOI]

Lanckriet, GR, De Bie, T, Cristianini, N, Jordan, MI, Noble, WS. A statistical framework for genomic data fusion. Bioinformatics. 2004; 20 (16): 2626-2635 [OpenAIRE] [PubMed] [DOI]

Lanckriet, GRG, Cristianini, N, Bartlett, P, El Ghaoui, L, Jordan, MI. Learning the kernel matrix with semidefinite programming. J Mach Learn Res. 2004; 5: 27-72

Tipping, ME. Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res. 2001; 1 (3): 211-244

Tipping, ME, Faul, AC. Fast marginal likelihood maximisation for sparse Bayesian models. AISTATS. 2003 [OpenAIRE]

CC, W, Asgharzadeh, S, Triche, TJ, D’Argenio, DZ. Prediction of human functional genetic networks from heterogeneous data using RVM-based ensemble learning. Bioinformatics. 2010; 26 (6): 807-813 [OpenAIRE] [PubMed] [DOI]

Zhou, D, Bousquet, O, Lal, TN, Weston, J, Schölkopf, B. Learning with local and global consistency. Adv Neural Inf Proces Syst. 2004; 16 (16): 321-328

24 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue