Evaluation Measures for Ordinal Regression

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jan 2009 Italy Publisher:IEEEJournal:2009 Ninth International Conference on Intelligent Systems Design and Applications

Authors: Baccianella S; Esuli A; Sebastiani F;

doi: 10.1109/isda.2009.230

handle: 20.500.14243/62327

Evaluation Measures for Ordinal Regression

- Summary
- Subjects
- Metrics

Abstract

Ordinal regression (OR -- also known as ordinal classification) has received increasing attention in recent times, due to its importance in IR applications such as learning to rank and product review rating. However, research has not paid attention to the fact that typical applications of OR often involve datasets that are highly imbalanced. An imbalanced dataset has the consequence that, when testing a system with an evaluation measure conceived for balanced datasets, a trivial system assigning all items to a single class (typically, the majority class) may even outperform genuinely engineered systems. Moreover, if this evaluation measure is used for parameter optimization, a parameter choice may result that makes the system behave very much like a trivial system. In order to avoid this, evaluation measures that can handle imbalance must be used. We propose a simple way to turn standard measures for OR into ones robust to imbalance. We also show that, once used on balanced datasets, the two versions of each measure coincide, and therefore argue that our measures should become the standard choice for OR.

Country

Italy

Related Organizations

National Research Council
Italy
National Research Council
Sri Lanka
Institute of Information Science and Technologies "A. Faedo"
Italy

Keywords

Ordinal classification, Ordinal regression, Evaluation measures

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	133
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%