
arXiv: 2408.05152
handle: 20.500.12876/EzR266gz
Matrix computations are a fundamental building-block of edge computing systems, with a major recent uptick in demand due to their use in AI/ML training and inference procedures. Existing approaches for distributing matrix computations involve allocating coded combinations of submatrices to worker nodes, to build resilience to slower nodes, called stragglers. In the edge learning context, however, these approaches will compromise sparsity properties that are often present in the original matrices found at the edge server. In this study, we consider the challenge of augmenting such approaches to preserve input sparsity when distributing the task across edge devices, thereby retaining the associated computational efficiency enhancements. First, we find a lower bound on the weight of coding, i.e., the number of submatrices to be combined to obtain coded submatrices, to provide the resilience to the maximum possible number of straggler devices (for given number of devices and their storage constraints). Next we propose distributed matrix computation schemes which meet the exact lower bound on the weight of the coding. Numerical experiments conducted in Amazon Web Services (AWS) validate our assertions regarding straggler mitigation and computation speed for sparse matrices.
arXiv admin note: text overlap with arXiv:2308.04331
FOS: Computer and information sciences, Stragglers, Computer Science - Distributed, Parallel, and Cluster Computing, MDS Codes, DegreeDisciplines::Engineering::Electrical and Computer Engineering, IoT/edge heterogeneity, Distributed, Parallel, and Cluster Computing (cs.DC), Sparsity, DegreeDisciplines::Engineering::Computational Engineering, Distributed computing, 004, 510
FOS: Computer and information sciences, Stragglers, Computer Science - Distributed, Parallel, and Cluster Computing, MDS Codes, DegreeDisciplines::Engineering::Electrical and Computer Engineering, IoT/edge heterogeneity, Distributed, Parallel, and Cluster Computing (cs.DC), Sparsity, DegreeDisciplines::Engineering::Computational Engineering, Distributed computing, 004, 510
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
