The use of vicinal-risk minimization for training decision trees

Article English OPEN
Cao, Y. ; Rockett, P.I. (2015)
  • Publisher: Elsevier

We propose the use of Vapnik's vicinal risk minimization (VRM) for training decision trees to approximately maximize decision margins. We implement VRM by propagating uncertainties in the input attributes into the labeling decisions. In this way, we perform a global regularization over the decision tree structure. During a training phase, a decision tree is constructed to minimize the total probability of misclassifying the labeled training examples, a process which approximately maximizes the margins of the resulting classifier. We perform the necessary minimization using an appropriate meta-heuristic (genetic programming) and present results over a range of synthetic and benchmark real datasets. We demonstrate the statistical superiority of VRM training over conventional empirical risk minimization (ERM) and the well-known C4.5 algorithm, for a range of synthetic and real datasets. We also conclude that there is no statistical difference between trees trained by ERM and using C4.5. Training with VRM is shown to be more stable and repeatable than by ERM.
  • References (20)
    20 references, page 1 of 2

    [1] L. Hyafil, R. L. Rivest, Constructing optimal binary decision trees is NP-complete, Information Processing Letters 5 (1) (1976) 15-17.

    [2] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.

    [3] R. Barros, M. Basgalupp, A. C. P. L. F. de Carvalho, A. Freitas, A survey of evolutionary algorithms for decision-tree induction, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 42 (3) (2012) 291-312.

    [9] O. Chapelle, J. Weston, L. Bottou, V. Vapnik, Vicinal risk minimization, in: T. K. Leen, T. G. Dietterich, V. Tresp (Eds.), Advances in Neural Information Processing Systems 13 (NIPS 2000), Denver, CO, 2000, pp. 416-422.

    [10] R. O. Duda, P. E. Hart, D. G. Stork, Pattern Recognition, 2nd Edition, John Wiley & Sons, New York, 2001.

    [11] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition, SpringerVerlag, 2009.

    [12] J. H. Friedman, A recursive partitioning decision rule for nonparametric classification, IEEE Transactions on Computers 26 (4) (1977) 404-408.

    [13] S. Yuksel, J. Wilson, P. Gader, Twenty years of mixture of experts, IEEE Transactions on Neural Networks and Learning System 23 (8) (2012) 1177-1193.

    [14] O. I˙rsoy, O. T. Yildiz, E. Alpaydin, Soft decision trees, in: 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 2012, pp. 1819-1822.

    [15] O. T. Yildiz, E. Alpaydin, Regularizing soft decision trees, in: 28th International Symposium on Computer and Information Sciences (ISCIS 2013), Paris, France, 2013, pp. 15-21.

  • Metrics
    views in OpenAIRE
    views in local repository
    downloads in local repository

    The information is available from the following content providers:

    From Number Of Views Number Of Downloads
    White Rose Research Online - IRUS-UK 0 23
Share - Bookmark