
arXiv: 1607.00291
Tensor computations--in particular tensor contraction (TC)--are important kernels in many scientific computing applications. Due to the fundamental similarity of TC to matrix multiplication (MM) and to the availability of optimized implementations such as the BLAS, tensor operations have traditionally been implemented in terms of BLAS operations, incurring both a performance and a storage overhead. Instead, we implement TC using the flexible BLIS framework, which allows for transposition (reshaping) of the tensor to be fused with internal partitioning and packing operations, requiring no explicit transposition operations or additional workspace. This implementation, TBLIS, achieves performance approaching that of MM, and in some cases considerably higher than that of traditional TC. Our implementation supports multithreading using an approach identical to that used for MM in BLIS, with similar performance characteristics. The complexity of managing tensor-to-matrix transformations is also handled automatically in our approach, greatly simplifying its use in scientific applications.
24 pages, 8 figures, uses pgfplots
FOS: Computer and information sciences, G.4, Computer Science - Performance, tensor-to-matrix transformation, matrix multiplication, Other matrix algorithms, tensor contraction, high-performance computing, 15A69, Performance (cs.PF), multilinear algebra, Complexity and performance of numerical algorithms, Computer Science - Distributed, Parallel, and Cluster Computing, Multilinear algebra, tensor calculus, Computer Science - Mathematical Software, Distributed, Parallel, and Cluster Computing (cs.DC), Mathematical Software (cs.MS), performance
FOS: Computer and information sciences, G.4, Computer Science - Performance, tensor-to-matrix transformation, matrix multiplication, Other matrix algorithms, tensor contraction, high-performance computing, 15A69, Performance (cs.PF), multilinear algebra, Complexity and performance of numerical algorithms, Computer Science - Distributed, Parallel, and Cluster Computing, Multilinear algebra, tensor calculus, Computer Science - Mathematical Software, Distributed, Parallel, and Cluster Computing (cs.DC), Mathematical Software (cs.MS), performance
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 48 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
