Predicting chemoinsensitivity in breast cancer with ’omics/digital pathology data fusion
Savage, Richard S.
- Publisher: The Royal Society Publishing
Royal Society Open Science,
22 | RC0254 | Bayesian | Genetics | Research Article | 1001 | breast cancer | data integration | 87
Predicting response to treatment and disease-specific deaths are key tasks in cancer research yet there is a lack of methodologies to achieve these. Large-scale ’omics and digital pathology technologies have led to the need for effective statistical methods for data fusion to extract the most useful patterns from these diverse data types. We present FusionGP, a method for combining heterogeneous data types designed specifically for predicting outcome of treatment and disease. FusionGP is a Gaussian process model that includes a generalization of feature selection for biomarker discovery, allowing for simultaneous, sparse feature selection across multiple data types. Importantly, it can accommodate highly nonlinear structure in the data, and automatically infers the optimal contribution from each input data type. FusionGP compares favourably to several popular classification methods, including the Random Forest classifier, a stepwise logistic regression model and the Support Vector Machine on single data types. By combining gene expression, copy number alteration and digital pathology image data in 119 estrogen receptor (ER)-negative and 345 ER-positive breast tumours, we aim to predict two important clinical outcomes: death and chemoinsensitivity. While gene expression data give the best predictive performance in the majority of cases, the digital pathology data are much better for predicting death in ER cases. Thus, FusionGP is a new tool for selecting informative features from heterogeneous data types and predicting treatment response and prognosis.