Boosting classifiers for drifting concepts

descriptionPublicationkeyboard_double_arrow_right Article , Report , Research , Preprint 15 Mar 2007Embargo end date: 16 Mar 2006 Germany Publisher:SAGE PublicationsJournal:Intelligent Data Analysis, volume 11, pages 3-28 (issn: 1088-467X, eissn: 1571-4128,

Copyright policy )

Authors: Scholz, Martin; Klinkenberg, Ralf;

doi: 10.3233/ida-2007-11102 , 10.17877/de290r-14320

handle: 10419/22652 , 2003/22236

Boosting classifiers for drifting concepts

- Summary
- Subjects
- Metrics

Abstract

In many real-world classification tasks, data arrives over time and the target concept to be learned from the data stream may change over time. Boosting methods are well-suited for learning from data streams, but do not address this concept drift problem. This paper proposes a boosting-like method to train a classifier ensemble from data streams that naturally adapts to concept drift. Moreover, it allows to quantify the drift in terms of its base learners. Similar as in regular boosting, examples are re-weighted to induce a diverse ensemble of base models. In order to handle drift, the proposed method continuously re-weights the ensemble members based on their performance on the most recent examples only. The proposed strategy adapts quickly to different kinds of concept drift. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the data and are thus not suited for mining massive streams. The proposed algorithm has low computational costs.

Country

Germany

Related Organizations

TU Dortmund University
Germany

Keywords

Boosting-like method, Drift, Base learners, Data stream, ddc:519, Mining massive streams, info:eu-repo/classification/ddc/004, Classifier ensemble, 004

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	61
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%