IIvotes ensemble for imbalanced data

descriptionPublicationkeyboard_double_arrow_right Article 08 Oct 2012Publisher:SAGE PublicationsJournal:Intelligent Data Analysis, volume 16, pages 777-801 (issn: 1088-467X, eissn: 1571-4128,

Copyright policy )

Authors: Jerzy Blaszczynski; Magdalena Deckert; Jerzy Stefanowski; Szymon Wilk;

doi: 10.3233/ida-2012-0551

IIvotes ensemble for imbalanced data

- Summary
- Metrics

Abstract

In the paper we present IIvotes – a new framework for constructing an ensemble of classifiers from imbalanced data. IIvotes incorporates the SPIDER method for selective data pre-processing into the adaptive Ivotes ensemble. Such an integration is aimed at improving balance between sensitivity and specificity (evaluated by the G-mean measure) for the minority class in comparison with single classifiers also combined with SPIDER. Using SPIDER to pre-process specific learning samples inside the ensemble improves sensitivity of derived component classifiers. At the same time the controlling mechanism of IIvotes ensures that overall accuracy (and thus specificity) is kept at a reasonable level. The new proposed IIvotes ensemble was thoroughly evaluated in a series of experiments where we tested it with symbolic (decision trees and rules) and non-symbolic (Naive Bayes) component classifiers. The results confirmed that combining SPIDER with an ensemble improved the performance (in terms of the G-mean measures) in comparison to a single classifier with SPIDER for all tested types of classifiers and two SPIDER pre-processing options (weak and strong amplification). These advantages were especially evident for decision trees and rules where differences between single and ensemble classifiers with SPIDER were more significant for both pre-processing options than for Naive Bayes. Moreover, the results demonstrated advantages of using a special abstaining classification strategy inside IIvotes rule ensembles, where component rule-based classifiers may refrain from predicting a class when in doubt. Abstaining rule ensembles performed much better with regard to G-mean than their non-abstaining variants.

Related Organizations

Poznań University of Technology
Poland

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	9
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

9

Top 10%

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now