Matrix completion discriminant analysis

descriptionPublicationkeyboard_double_arrow_right Article 01 Dec 2015 English Publisher:Elsevier BVJournal:Computational Statistics & Data Analysis, volume 92, pages 115-125 (issn: 0167-9473,

Copyright policy )

Authors: Tong Tong Wu; Kenneth Lange;

doi: 10.1016/j.csda.2015.06.006

pmid: 26549920

pmc: PMC4634674

Matrix completion discriminant analysis

- Summary
- Subjects
- Metrics

Abstract

Matrix completion discriminant analysis (MCDA) is designed for semi-supervised learning where the rate of missingness is high and predictors vastly outnumber cases. MCDA operates by mapping class labels to the vertices of a regular simplex. With c classes, these vertices are arranged on the surface of the unit sphere in c - 1 dimensional Euclidean space. Because all pairs of vertices are equidistant, the classes are treated symmetrically. To assign unlabeled cases to classes, the data is entered into a large matrix (cases along rows and predictors along columns) that is augmented by vertex coordinates stored in the last c - 1 columns. Once the matrix is constructed, its missing entries can be filled in by matrix completion. To carry out matrix completion, one minimizes a sum of squares plus a nuclear norm penalty. The simplest solution invokes an MM algorithm and singular value decomposition. Choice of the penalty tuning constant can be achieved by cross validation on randomly withheld case labels. Once the matrix is completed, an unlabeled case is assigned to the class vertex closest to the point deposited in its last c - 1 columns. A variety of examples drawn from the statistical literature demonstrate that MCDA is competitive on traditional problems and outperforms alternatives on large-scale problems.

Related Organizations

University of Rochester
United States
University of California, Los Angeles
United States

Keywords

semi-supervised learning, Classification and discrimination; cluster analysis (statistical aspects), Numerical linear algebra, singular value decomposition, Matrix completion problems, missing observations, MM algorithm, classification, Computational methods for problems pertaining to statistics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	6
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average