
doi: 10.1137/18m1210101
handle: 10754/656844
Summary: Randomized algorithms for the generation of low rank approximations of large dense matrices have become popular methods in scientific computing and machine learning. In this paper, we extend the scope of these methods and present batched GPU randomized algorithms for the efficient generation of low rank representations of large sets of small dense matrices, as well as their generalization to the construction of hierarchically low rank symmetric \(\mathcal{H}^2\) matrices with general partitioning structures. In both cases, the algorithms need to access the matrices only through matrix-vector multiplication operations which can be done in blocks to increase the arithmetic intensity and substantially boost the resulting performance. The batched GPU kernels are adaptive, allow nonuniform sizes in the matrices of the batch, and are more effective than SVD factorizations on matrices with fast decaying spectra. The hierarchical matrix generation consists of two phases, interleaved at every level of the matrix hierarchy. A first phase adaptively generates low rank approximations of matrix blocks through randomized matrix-vector sampling. A second phase accumulates and compresses these blocks into a hierarchical matrix that is incrementally constructed. The accumulation expresses the low rank blocks of a given level as a set of local low rank updates that are performed simultaneously on the whole matrix allowing high-performance batched kernels to be used in the compression operations. When the ranks of the blocks generated in the first phase are too large to be processed in a single operation, the low rank updates can be split into smaller-sized updates and applied in sequence. Assuming representative rank \(k\), the resulting matrix has optimal \(O(kN)\) asymptotic storage complexity because of the nested bases it uses. The ability to generate an \(\mathcal{H}^2\) matrix from matrix-vector products allows us to support a general randomized matrix-matrix multiplication operation, an important kernel in hierarchical matrix computations. Numerical experiments demonstrate the high performance of the algorithms and their effectiveness in generating hierarchical matrices to a desired target accuracy.
hierarchical matrices, Other programming paradigms (object-oriented, sequential, concurrent, automatic, etc.), matrix compression, randomized algorithms, Numerical linear algebra, Randomized algorithms, GPU, Approximation algorithms, low rank factorization, nested bases, Parallel algorithms in computer science, batched algorithms, low rank updates, matrix-matrix multiplication
hierarchical matrices, Other programming paradigms (object-oriented, sequential, concurrent, automatic, etc.), matrix compression, randomized algorithms, Numerical linear algebra, Randomized algorithms, GPU, Approximation algorithms, low rank factorization, nested bases, Parallel algorithms in computer science, batched algorithms, low rank updates, matrix-matrix multiplication
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 11 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
