
We address in this paper the parallelization of a recursive algorithm for large scale triangular matrix inversion based on the `Divide and Conquer' (D&C) paradigm. A set of different versions of an original sequential algorithm are first presented. A theoretical performance study permits to establish an accurate comparison between the designed algorithms. Afterwards, we develop in the second part of the paper, an optimal parallel avoiding-communication algorithm for a given number of available homogeneous and heterogeneous processors. To reach this target, we use a so called `non equitable and incomplete' version of the D&C paradigm consisting in recursively decomposing the original problem into two sub-problems of non equal sizes, then decomposing only one sub-problem in the same previous manner. The theoretical study is validated by a series of experiments achieved on three target platforms, namely an 8-core shared memory machine, a distributed memory cluster and a heterogeneous CPU-GPU cluster. The obtained results permit to illustrate the interest of the contribution.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
