
arXiv: 2508.04917
Sparse linear systems are typically solved using preconditioned iterative methods, but applying preconditioners via sparse triangular solves introduces bottlenecks due to irregular memory accesses and data dependencies. This work leverages fine-grained domain decomposition to adapt triangular solves to the GPU architecture. We develop a fine-grained domain decomposition strategy that generates non-overlapping subdomains, increasing parallelism in the application of preconditioner at the expense of a modest increase in the iteration count for convergence. Each subdomain is assigned to a thread block and is sized such that the subdomain vector fits in the GPU shared memory, eliminating the need for inter-block synchronization and reducing irregular global memory accesses. Compared to other state-of-the-art implementations using the ROCm$^{\text{TM}}$ software stack, we achieve a 10.7$\times$ speedup for triangular solves and a 3.2$\times$ speedup for the ILU0-preconditioned biconjugate gradient stabilized (BiCGSTAB) solver on the AMD Instinct$^{\text{TM}}$ MI210 GPU.
14 pages, 14 figures
Performance (cs.PF), FOS: Computer and information sciences, Numerical Analysis, Performance, G.1.3; D.1.3, FOS: Mathematics, Numerical Analysis (math.NA)
Performance (cs.PF), FOS: Computer and information sciences, Numerical Analysis, Performance, G.1.3; D.1.3, FOS: Mathematics, Numerical Analysis (math.NA)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
