Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Report
Data sources: ZENODO
addClaim

The Real Limits of Distributed LLM Training

Authors: Morrison, Sterling;

The Real Limits of Distributed LLM Training

Abstract

We analyze a federated, peer-to-peer LLM training architecture that uses delta compression,BitTorrent-style chunked model distribution, and hierarchical merging to coordinate trainingacross thousands of consumer GPUs. The architecture is internally coherent and contains severalnon-trivial engineering decisions worth documenting; it is also, for the intended use case oftraining frontier-scale language models, the wrong shape of the problem. We characterize sevenconcrete failure modes – bandwidth, straggler effect, FedAvg convergence under non-IID data,the consumer-VRAM ceiling, total cost of training, the security envelope of the delta-validationrules, and data provenance – each paired with a reproducible Python script. The conclusion isthat for frontier-scale models the centralized cluster is faster, cheaper, and safer by enough thatdistributed federated training is economically and mathematically dominated. We close with ashort list of regimes where federated training remains the right tool.

Powered by OpenAIRE graph
Found an issue? Give us feedback