descriptionPublicationkeyboard_double_arrow_right Article , Preprint 22 Feb 2022Embargo end date: 01 Jan 2021Publisher:ACMJournal:Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsFunded by:NSF | Collaborative Research: F...

Authors: Abhinav Jangda; Jun Huang; Guodong Liu; Amir Hossein Nodehi Sabet; Saeed Maleki; Youshan Miao; Madanlal Musuvathi; +2 Authors

doi: 10.1145/3503222.3507778 , 10.48550/arxiv.2105.05720

arXiv: 2105.05720

Breaking the computation and communication abstraction barrier in distributed machine learning workloads

- Summary
- Subjects
- Related research
  (4)
- Metrics

Abstract

Recent trend towards increasing large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, current logical separation between computation and communication kernels in deep learning frameworks misses the optimization opportunities across such barrier. Breaking this abstraction with a holistic consideration can provide many optimizations to provide performance improvements in distributed workloads. Manually applying these optimizations needs modifications in underlying computation and communication libraries for each scenario, which is time consuming and error-prone. Therefore, we present CoCoNeT, with a DSL to express a program with both computation and communication. CoCoNeT contains several machine learning aware transformations to optimize a program and a compiler to generate high performance kernels. Providing both computation and communication as first class constructs allows users to work on a high-level abstraction and apply powerful optimizations, such as fusion or overlapping of communication and computation. CoCoNeT enables us to optimize data-, model-and pipeline-parallel workloads in large language models with only a few lines of code. Experiments show CoCoNeT significantly outperforms state-of-the-art distributed machine learning implementations.

Related Organizations

Chinese Academy of Sciences
China (People's Republic of)
University of Massachusetts System
United States
The Ohio State University
United States
Northeastern University
United States
University of California System
United States

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Programming Languages, Computer Science - Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC), Machine Learning (cs.LG), Programming Languages (cs.PL)

4 Research products, page 1 of 1

cutlass software on GitHub
IsRelatedTo
DeepSpeed software on GitHub
IsRelatedTo
apex software on GitHub
IsRelatedTo
CoCoNet software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	41
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%