
arXiv: 2105.01976
Sparse, irregular graphs show up in various applications like linear algebra, machine learning, engineering simulations, robotic control, etc. These graphs have a high degree of parallelism, but their execution on parallel threads of modern platforms remains challenging due to the irregular data dependencies. The execution performance can be improved by efficiently partitioning the graphs such that the communication and thread synchronization overheads are minimized without hurting the utilization of the threads. To achieve this, this paper proposes GRAPHOPT, a tool that models the graph parallelization as a constrained optimization problem and uses the open Google OR-Tools solver to find good partitions. Several scalability techniques are developed to handle large real-world graphs with millions of nodes and edges. Extensive experiments are performed on the graphs of sparse matrix triangular solves (linear algebra) and sum-product networks (machine learning), respectively, showing a mean speedup of 2.0X and 1.8X over previous state-of-the-art libraries, demonstrating the effectiveness of the constrained-optimization-based graph parallelization.
Optimization, FOS: Computer and information sciences, Technology, 0805 Distributed Computing, 4606 Distributed computing and systems software, Synchronization, Instruction sets, Engineering, Hardware, Computer Science, Theory & Methods, partitioning, Parallel processing, 1005 Communications Technologies, CPU multithreading, Science & Technology, Scalability, 0803 Computer Software, Engineering, Electrical & Electronic, constrained optimization, sparse matrix triangular solves, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science, Task analysis, Graph parallelization, Distributed, Parallel, and Cluster Computing (cs.DC), Distributed Computing
Optimization, FOS: Computer and information sciences, Technology, 0805 Distributed Computing, 4606 Distributed computing and systems software, Synchronization, Instruction sets, Engineering, Hardware, Computer Science, Theory & Methods, partitioning, Parallel processing, 1005 Communications Technologies, CPU multithreading, Science & Technology, Scalability, 0803 Computer Software, Engineering, Electrical & Electronic, constrained optimization, sparse matrix triangular solves, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science, Task analysis, Graph parallelization, Distributed, Parallel, and Cluster Computing (cs.DC), Distributed Computing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
