Deviance matrix factorization

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2023Embargo end date: 01 Jan 2021Publisher:Institute of Mathematical StatisticsJournal:Electronic Journal of Statistics, volume 17 (issn: 1935-7524,

Copyright policy )

Authors: Liang Wang; Luis Carvalho;

doi: 10.1214/23-ejs2174 , 10.48550/arxiv.2110.05674

arXiv: 2110.05674

Deviance matrix factorization

- Summary
- Subjects
- Metrics

Abstract

We investigate a general matrix factorization for deviance-based data losses, extending the ubiquitous singular value decomposition beyond squared error loss. While similar approaches have been explored before, our method leverages classical statistical methodology from generalized linear models (GLMs) and provides an efficient algorithm that is flexible enough to allow for structural zeros via entry weights. Moreover, by adapting results from GLM theory, we provide support for these decompositions by (i) showing strong consistency under the GLM setup, (ii) checking the adequacy of a chosen exponential family via a generalized Hosmer-Lemeshow test, and (iii) determining the rank of the decomposition via a maximum eigenvalue gap method. To further support our findings, we conduct simulation studies to assess robustness to decomposition assumptions and extensive case studies using benchmark datasets from image face recognition, natural language processing, network analysis, and biomedical studies. Our theoretical and empirical results indicate that the proposed decomposition is more flexible, general, and robust, and can thus provide improved performance when compared to similar methods. To facilitate applications, an R package with efficient model fitting and family and rank determination is also provided.

Related Organizations

Boston University
United States
Boston College
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, principal component analysis, Statistics, Machine Learning (stat.ML), Statistics - Applications, Statistics - Computation, Machine Learning (cs.LG), non-negative matrix factorization, Methodology (stat.ME), factor models, Statistics - Machine Learning, Applications (stat.AP), Statistics - Methodology, Computation (stat.CO)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

gold

Fields of Science (4) View all

Fields of Science