
We consider computing tall-skinny QR factorizations on a large-scale parallel machine. We present a realistic performance model and analyze the difference of the parallel execution time between Householder QR and TSQR. Our analysis indicates the possibility that TSQR becomes slower than Householder QR as the number of columns of the target matrix increases. We aim for estimating the difference and selecting the faster algorithm by using models, which falls into auto-tuning. Numerical experiments on the K computer support our analysis and show our success in determining the faster algorithm.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 6 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
