
arXiv: 2203.02798
We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix, and (iii) computation of the squared row norms of the product of two matrices, with a special focus on “tall-and-skinny” matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in pylspack , a publicly available Python package 1 whose core is written in C++ and parallelized with OpenMP and that is compatible with standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.
FOS: Computer and information sciences, sparse data structures, 65F50, 65F08, 65F20, 68W10, 68W20, 68P05, G.3, parallel algorithms, statistical leverage scores, Computer Science - Distributed, Parallel, and Cluster Computing, preconditioning, F.2.1; G.3; E.1, Computer Science - Data Structures and Algorithms, sketching, regression, Computer Science - Mathematical Software, Data Structures and Algorithms (cs.DS), F.2.1, Distributed, Parallel, and Cluster Computing (cs.DC), E.1, column subset selection, Mathematical Software (cs.MS), Numerical analysis
FOS: Computer and information sciences, sparse data structures, 65F50, 65F08, 65F20, 68W10, 68W20, 68P05, G.3, parallel algorithms, statistical leverage scores, Computer Science - Distributed, Parallel, and Cluster Computing, preconditioning, F.2.1; G.3; E.1, Computer Science - Data Structures and Algorithms, sketching, regression, Computer Science - Mathematical Software, Data Structures and Algorithms (cs.DS), F.2.1, Distributed, Parallel, and Cluster Computing (cs.DC), E.1, column subset selection, Mathematical Software (cs.MS), Numerical analysis
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
