TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Preprint 28 Oct 2023Embargo end date: 01 Jan 2023 United States Publisher:ACMJournal:56th Annual IEEE/ACM International Symposium on Microarchitecture

Authors: Haotian Tang; Shang Yang; Zhijian Liu; Ke Hong; Zhongming Yu; Xiuyu Li; Guohao Dai; +2 Authors

doi: 10.1145/3613424.3614303 , 10.5281/zenodo.8311888 , 10.5281/zenodo.8311889 , 10.48550/arxiv.2311.12862

arXiv: 2311.12862

handle: 1721.1/153260

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

- Summary
- Subjects
- Metrics

Abstract

Sparse convolution computation is important for AR/VR and ADAS. It involves sparse and irregular computation patterns, requiring specialized high-performance kernels. Existing GPU libraries offer two dataflow types for this workload. The gather-GEMM-scatter dataflow is easy to implement but not optimal in performance, while the dataflows with overlapped computation and memory access (e.g. implicit GEMM) are highly performant but have very high engineering costs. In this work we introduce TorchSparse++, a new GPU library that achieves the best of both worlds. We create a highly efficient Sparse Kernel Generator that generates performant sparse point cloud convolution kernels at less than one-tenth of the engineering cost of the current state-of-the-art system. On top of this, we design the Sparse Autotuner, which extends the design space of existing point cloud libraries and searches for the best dataflow configurations for training and inference workloads. Consequently, TorchSparse++ achieves 2.9x, 3.3x, 2.2x and 1.7x measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is 1.2-1.3x faster than SpConv v2 in mixed precision training.

Country

United States

Related Organizations

University of California, San Diego
United States
Tsinghua University
China (People's Republic of)
Shanghai Jiao Tong University
China (People's Republic of)
University of California, San Francisco
United States
nVIDIA
United States

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Performance, Computer Vision and Pattern Recognition (cs.CV), sparsity, Computer Science - Computer Vision and Pattern Recognition, GPU, high-performance computing, Machine Learning (cs.LG), Performance (cs.PF), Computer Science - Distributed, Parallel, and Cluster Computing, sparse convolution, Distributed, Parallel, and Cluster Computing (cs.DC), point cloud

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	22
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%