On Strong Consistency of Model Selection in Classification

descriptionPublicationkeyboard_double_arrow_right Article 01 Nov 2006Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Information Theory, volume 52, pages 4,767-4,774 (issn: 0018-9448,

Copyright policy )

Authors: Joe Suzuki;

doi: 10.1109/tit.2006.883611

On Strong Consistency of Model Selection in Classification

- Summary
- Metrics

Abstract

This paper considers model selection in classification. In many applications such as pattern recognition, probabilistic inference using a Bayesian network, prediction of the next in a sequence based on a Markov chain, the conditional probability P(Y=y|X=x) of class yisinY given attribute value xisinX is utilized. By model we mean the equivalence relation in X: for x,x'isinXx~x'hArrP(Y=y|X=x)=P(Y=y|X=x'), forall yisinY. By classification we mean the number of such equivalence classes is finite. We estimate the model from n samples zn=(xi,yi)i=1 n isin(XtimesY)n, using information criteria in the form empirical entropy H plus penalty term (k/2)dn (the model such that H+(k/2)dn is minimized is the estimated model), where k is the number of independent parameters in the model, and {dn}n=1 infin is a real nonnegative sequence such that lim supndn/n=0. For autoregressive processes, although the definitions of H and k are different, it is known that the estimated model almost surely coincides with the true model as nrarrinfin if {dn}n=1 infin>{2loglogn}n=1 infin, and that it does not if {dn}n=1 infin<{2loglogn}n=1 infin (Hannan and Quinn). The problem whether the same property is true for classification was open. This paper solves the problem in the affirmative

Related Organizations

Osaka University
Japan

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	12
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average