
Abstract Solving sparse triangular systems of linear equations is a performance bottleneck in many methods for solving more general sparse systems. Both for direct methods and for many iterative preconditioners, it is used to solve the system or improve an approximate solution, often across many iterations. Solving triangular systems is notoriously resistant to parallelism, however, and existing parallel linear algebra packages appear to be ineffective in exploiting significant parallelism for this problem. We develop a novel parallel algorithm based on various heuristics that adapt to the structure of the matrix and extract parallelism that is unexploited by conventional methods. By analyzing and reordering operations, our algorithm can often extract parallelism even for cases where most of the nonzero matrix entries are near the diagonal. Our main parallelism strategies are: (1) identify independent rows, (2) send data earlier to achieve greater overlap, and (3) process dense off-diagonal regions in parallel. We describe the implementation of our algorithm in Charm++ and MPI and present promising experimental results on up to 512 cores of BlueGene/P, using numerous sparse matrices from real applications.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 24 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
