Stable Graphical Model Estimation with Random Forests for Discrete, Continuous, and Mixed Variables
von Rhein, Michael
Reinhardt, Jan D.
Statistics - Methodology | Statistics - Applications | Statistics - Computation
A conditional independence graph is a concise representation of pairwise conditional independence among many variables. Graphical Random Forests (GRaFo) are a novel method for estimating pairwise conditional independence relationships among mixed-type, i.e. continuous and discrete, variables. The number of edges is a tuning parameter in any graphical model estimator and there is no obvious number that constitutes a good choice. Stability Selection helps choosing this parameter with respect to a bound on the expected number of false positives (error control). The performance of GRaFo is evaluated and compared with various other methods for p = 50, 100, and 200 possibly mixed-type variables while sample size is n = 100 (n = 500 for maximum likelihood). Furthermore, GRaFo is applied to data from the Swiss Health Survey in order to evaluate how well it can reproduce the interconnection of functional health components, personal, and environmental factors, as hypothesized by the World Health Organization's International Classification of Functioning, Disability and Health (ICF). Finally, GRaFo is used to identify risk factors which may be associated with adverse neurodevelopment of children who suffer from trisomy 21 and experienced open-heart surgery. GRaFo performs well with mixed data and thanks to Stability Selection it provides an error control mechanism for false positive selection.