High-performance Tensor Contractions for GPUs

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jan 2016 English Publisher:Elsevier BVJournal:Procedia Computer Science, volume 80, pages 108-118 (issn: 1877-0509,

Copyright policy )

Authors: Abdelfattah, Ahmad; Baboulin, Marc; Dobrev, Veselin; Dongarra, Jack; Earl, Christopher; Falcou, Joël; Haidar, Azzam; +4 Authors

doi: 10.1016/j.procs.2016.05.302

High-performance Tensor Contractions for GPUs

- Summary
- Subjects
- Metrics

Abstract

We present a computational framework for high-performance tensor contractions on GPUs. High-performance is difficult to obtain using existing libraries, especially for many independent contractions where each contraction is very small, e.g., sub-vector/warp in size. However, using our framework to batch contractions plus application-specifics, we demonstrate close to peak performance results. In particular, to accelerate large scale tensor-formulated high-order finite element method (FEM) simulations, which is the main focus and motivation for this work, we represent contractions as tensor index reordering plus matrix-matrix multiplications (GEMMs). This is a key factor to achieve algorithmically many-fold acceleration (vs. not using it) due to possible reuse of data loaded in fast memory. In addition to using this context knowledge, we design tensor data-structures, tensor algebra interfaces, and new tensor contraction algorithms and implementations to achieve 90+% of a theoretically derived peak on GPUs. On a K40c GPU for contractions resulting in GEMMs on square matrices of size 8 for example, we are 2.8× faster than CUBLAS, and 8.5× faster than MKL on 16 cores of Intel Xeon E5-2670 (Sandy Bridge) 2.60GHz CPUs. Finally, we apply autotuning and code generation techniques to simplify tuning and provide an architecture-aware, user-friendly interface.

Related Organizations

Lawrence Berkeley National Laboratory
United States
Lawrence Livermore National Laboratory
United States
University of Paris-Saclay
France
French National Centre for Scientific Research
France
University of Salford
United Kingdom

View all View all

Keywords

FEM, [INFO.INFO-NA] Computer Science [cs]/Numerical Analysis [cs.NA], Applications, GPU, Tensor contractions, Batched linear algebra, Tensor HPC

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	44
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

44

Top 10%

Green

gold

Fields of Science (4) View all

Fields of Science