Enhancing performance of Tall-Skinny QR factorization using FPGAs

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Aug 2012 Singapore Publisher:IEEEJournal:22nd International Conference on Field Programmable Logic and Applications (FPL)Funded by:FCT | D4, UKRI | Real-time Numerical Optim...

Authors: Abid Rafique; Nachiket Kapre; George A. Constantinides;

doi: 10.1109/fpl.2012.6339142

Enhancing performance of Tall-Skinny QR factorization using FPGAs

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Communication-avoiding linear algebra algorithms with low communication latency and high memory bandwidth requirements like Tall-Skinny QR factorization (TSQR) are highly appropriate for acceleration using FPGAs. TSQR parallelizes QR factorization of tall-skinny matrices in a divide-and-conquer fashion by decomposing them into sub-matrices, performing local QR factorizations and then merging the intermediate results. As TSQR is a dense linear algebra problem, one would therefore imagine GPU to show better performance. However, the performance of GPU is limited by the memory bandwidth in local QR factorizations and global communication latency in the merge stage. We exploit the shape of the matrix and propose an FPGA-based custom architecture which avoids these bottlenecks by using high-bandwidth on-chip memories for local QR factorizations and by performing the merge stage entirely on-chip to reduce communication latency. We achieve a peak double-precision floating-point performance of 129 GFLOPs on Virtex-6 SX475T. A quantitative comparison of our proposed design with recent QR factorization on FPGAs and GPU shows up to 7.7× and 12.7× speed up respectively. Additionally, we show even higher performance over optimized linear algebra libraries like Intel MKL for multi-cores, CULA for GPUs and MAGMA for hybrid systems.

Country

Singapore

Related Organizations

Nanyang Technological University
Singapore
Imperial College London
United Kingdom

Keywords

Computer Science and Engineering

3 Research products, page 1 of 1

Performance Analysis of the Householder-Type Parallel Tall-Skinny QR Factorizations Toward Automatic Algorithm Selection
2015IsAmongTopNSimilarDocuments
Exploring Dual-Triangular Structure for Efficient R-Initiated Tall-Skinny QR on GPGPU
2019IsAmongTopNSimilarDocuments
Reconstructing Householder vectors from Tall-Skinny QR
2014IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	11
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%