Mining the forest: uncovering biological mechanisms by interpreting Random Forests

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Preprint 10 Nov 2017Publisher:openRxivFunded by:NWO | Scale-space theory: A new...

Authors: de Ruiter, Julian; Knijnenburg, Theo; de Ridder, Jeroen;

doi: 10.1101/217695

Mining the forest: uncovering biological mechanisms by interpreting Random Forests

- Summary
- Metrics

Abstract

Abstract Biological datasets are large and complex. Machine learning models are therefore essential to capture relationships in the data. Unfortunately, the inferred complex models are often difficult to understand and interpretation is limited to a list of features ranked on their importance in the model. We propose a computational approach, called Foresight, that enables interpretation of the patterns uncovered by Random Forest models trained on biological datasets. Foresight exploits the correlation structure in the data to uncover relevant groups of features and the interactions between them. This facilitates interpretation of the computational model and can provide more detailed insight in the underlying biological relationships than simply ranking features. We demonstrate Foresight on both an artificial dataset and a large gene expression dataset of breast cancer patients. Using the latter dataset we show that our approach retrieves biologically relevant features and provides a rich description of the interactions and correlation structure between these features.

Related Organizations

Delft University of Technology
Netherlands
Netherlands Heart Institute
Netherlands
Institiute for Systems Biology
United States
Technische Universiteit Delft
Institute for Systems Biology
United States

View all View all

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average