Fast Parallel Randomized Algorithm for Nonnegative Matrix Factorization with KL Divergence for Large Sparse Datasets

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Apr 2016Embargo end date: 01 Jan 2016Publisher:EJournal PublishingJournal:International Journal of Machine Learning and Computing, volume 6, pages 111-116 (issn: 2010-3700,

Copyright policy )

Authors: Nguyen, Duy; Ho, Tu;

doi: 10.18178/ijmlc.2016.6.2.583 , 10.48550/arxiv.1604.04026

arXiv: 1604.04026

Fast Parallel Randomized Algorithm for Nonnegative Matrix Factorization with KL Divergence for Large Sparse Datasets

- Summary
- Subjects
- Metrics

Abstract

Nonnegative Matrix Factorization (NMF) with Kullback-Leibler Divergence (NMF-KL) is one of the most significant NMF problems and equivalent to Probabilistic Latent Semantic Indexing (PLSI), which has been successfully applied in many applications. For sparse count data, a Poisson distribution and KL divergence provide sparse models and sparse representation, which describe the random variation better than a normal distribution and Frobenius norm. Specially, sparse models provide more concise understanding of the appearance of attributes over latent components, while sparse representation provides concise interpretability of the contribution of latent components over instances. However, minimizing NMF with KL divergence is much more difficult than minimizing NMF with Frobenius norm; and sparse models, sparse representation and fast algorithms for large sparse datasets are still challenges for NMF with KL divergence. In this paper, we propose a fast parallel randomized coordinate descent algorithm having fast convergence for large sparse datasets to archive sparse models and sparse representation. The proposed algorithm's experimental results overperform the current studies' ones in this problem.

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, bepress|Physical Sciences and Mathematics|Mathematics, Optimization and Control (math.OC), bepress|Physical Sciences and Mathematics|Computer Sciences, FOS: Mathematics, Mathematics - Numerical Analysis, Numerical Analysis (math.NA), Mathematics - Optimization and Control, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	8
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

8

Average

Green

gold

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering