Hyperparameters and tuning strategies for random forest

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 28 Jan 2019Embargo end date: 01 Jan 2018 English Publisher:WileyJournal:WIREs Data Mining and Knowledge Discovery, volume 9 (issn: 1942-4787, eissn: 1942-4795,

Copyright policy )Funded by:DFG | unidentified

Authors: Philipp Probst; Marvin N. Wright; Anne-Laure Boulesteix;

doi: 10.1002/widm.1301 , 10.48550/arxiv.1804.03515

arXiv: 1804.03515

Hyperparameters and tuning strategies for random forest

- Summary
- Subjects
- Metrics

Abstract

The random forest (RF) algorithm has several hyperparameters that have to be set by the user, for example, the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain, and the number of trees. In this paper, we first provide a literature review on the parameters' influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a presenting brief overview of tuning strategies, we demonstrate the application of one of the most established tuning strategies, model‐based optimization (MBO). To make it easier to use, we provide thetuneRangerR package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime oftuneRangerwith other tuning implementations in R and RF with default hyperparameters.This article is categorized under:Algorithmic Development > Biological Data MiningAlgorithmic Development > StatisticsAlgorithmic Development > Hierarchies and TreesTechnologies > Machine Learning

Related Organizations

Ludwig-Maximilians-Universität München
Germany
Leibniz Association
Germany
Leibniz Institute for Prevention Research and Epidemiology - BIPS
Germany

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1K
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.01%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 0.1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 0.01%

Found an issue? Give us feedback

1K

Top 0.01%

Top 0.1%

Top 0.01%

Green

bronze

Fields of Science (4) View all

Fields of Science

Funded by