Bregman Divergences and Surrogates for Learning

descriptionPublicationkeyboard_double_arrow_right Article 01 Nov 2009Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 31, pages 2,048-2,059 (issn: 0162-8828,

Copyright policy )

Authors: Richard Nock; Frank Nielsen;

doi: 10.1109/tpami.2008.225

pmid: 19762930

Bregman Divergences and Surrogates for Learning

- Summary
- Subjects
- Metrics

Abstract

Bartlett et al. (2006) recently proved that a ground condition for surrogates, classification calibration, ties up their consistent minimization to that of the classification risk, and left as an important problem the algorithmic questions about their minimization. In this paper, we address this problem for a wide set which lies at the intersection of classification calibrated surrogates and those of Murata et al. (2004). This set coincides with those satisfying three common assumptions about surrogates. Equivalent expressions for the members-sometimes well known-follow for convex and concave surrogates, frequently used in the induction of linear separators and decision trees. Most notably, they share remarkable algorithmic features: for each of these two types of classifiers, we give a minimization algorithm provably converging to the minimum of any such surrogate. While seemingly different, we show that these algorithms are offshoots of the same "master" algorithm. This provides a new and broad unified account of different popular algorithms, including additive regression with the squared loss, the logistic loss, and the top-down induction performed in CART, C4.5. Moreover, we show that the induction enjoys the most popular boosting features, regardless of the surrogate. Experiments are provided on 40 readily available domains.

Related Organizations

École Normale Supérieure
France
École Polytechnique
France
Université des Antilles Pôle Martinique
Martinique

Keywords

Artificial Intelligence, Computer Simulation, Models, Theoretical, Algorithms, Decision Support Techniques, Pattern Recognition, Automated

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	31
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%