Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion

descriptionPublicationkeyboard_double_arrow_right Article 15 Feb 2014 English Publisher:Springer Science and Business Media LLCJournal:Knowledge and Information Systems, volume 42, pages 493-523 (issn: 0219-1377, eissn: 0219-3116,

Copyright policy )

Authors: Makari, Faraz; Teflioudi, Christina; Gemulla, Rainer; Haase, Peter; Sismanis, Yannis;

doi: 10.1007/s10115-013-0718-7

handle: 11858/00-001M-0000-0024-4F57-9

Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion

- Summary
- Subjects
- Metrics

Abstract

We provide parallel algorithms for large-scale matrix completion on problems with millions of rows, millions of columns, and billions of revealed entries. We focus on in-memory algorithms that run either in a shared-memory environment on a powerful compute node or in a shared-nothing environment on a small cluster of commodity nodes; even very large problems can be handled effectively in these settings. Our ASGD, DSGD-MR, DSGD++, and CSGD algorithms are novel variants of the popular stochastic gradient descent (SGD) algorithm, with the latter three algorithms based on a new "stratified SGD" approach. All of the algorithms are cache-friendly and exploit thread-level parallelism, in-memory processing, and asynchronous communication. We investigate the performance of both new and existing algorithms via a theoretical complexity analysis and a set of large-scale experiments. The results show that CSGD is more scalable, and up to 60 % faster, than the best-performing alternative method in the shared-memory setting. DSGD++ is superior in terms of overall runtime, memory consumption, and scalability in the shared-nothing setting. For example, DSGD++ can solve a difficult matrix completion problem on a high-variance matrix with 10M rows, 1M columns, and 10B revealed entries in around 40 min on 16 compute nodes. In general, algorithms based on SGD appear to perform better than algorithms based on alternating minimizations, such as the PALS and DALS alternating least-squares algorithms.

Related Organizations

Keywords

004

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	14
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average