In-Memory Distributed Matrix Computation Processing and Optimization

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Apr 2017 Qatar Publisher:IEEEJournal:2017 IEEE 33rd International Conference on Data Engineering (ICDE)

Authors: Yu, Yongyang; Tang, Mingjie; Aref, Walid G.; Malluhi, Qutaibah M.; Abbas, Mostafa M.; Ouzzani, Mourad;

doi: 10.1109/icde.2017.150

handle: 10576/16192

In-Memory Distributed Matrix Computation Processing and Optimization

- Summary
- Subjects
- Metrics

Abstract

The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This paper presents new efficient and scalable matrix processing and optimization techniques for in-memory distributed clusters. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics to optimize the cost of matrix computations in an in-memory distributed environment. The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix processing and optimization techniques in Spark, a distributed in-memory computing platform. Experiments on both real and synthetic data demonstrate that our proposed techniques achieve up to an order-of-magnitude performance improvement over state-of the-art distributed matrix computation systems on a wide range of applications.

Country

Qatar

Related Organizations

Hamad bin Khalifa University
Qatar
Qatar University
Qatar
Qatar Computing Research Institute
Qatar
Purdue University West Lafayette
United States
Purdue University System
United States

Keywords

Matrix computation, Query optimization, Distributed computing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	15
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

15

Top 10%

Average

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering