GPU Interconnect Benchmarking on 8x NVIDIA A100-SXM4-80GB with NVLink and Kubeflow

Ozdemir, Yagmur Idil

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Software

Data sources: ZENODO

GPU Interconnect Benchmarking on 8x NVIDIA A100-SXM4-80GB with NVLink and Kubeflow

integration_instructionsResearch softwarekeyboard_double_arrow_right Software Under curationPublisher:Zenodo

Authors: Ozdemir, Yagmur Idil;

doi: 10.5281/zenodo.20560283

GPU Interconnect Benchmarking on 8x NVIDIA A100-SXM4-80GB with NVLink and Kubeflow

- Summary

Abstract

Performance evaluation of 8x NVIDIA A100-SXM4-80GB GPUs interconnected via NVSwitch (NV12) on UCL ARC's Kubeflow platform. Benchmarking suite includes: (1) NVBandwidth point-to-point GPU transfer measurements comparing bare metal vs Kubeflow, NVLink-enabled vs disabled, and A100-SXM4 vs A100-PCIe configurations; (2) NCCL collective communication benchmarks (all-reduce, all-gather, broadcast, reduce-scatter, send-recv) with analysis of bus bandwidth scaling, GPU count scaling, thread count impact, and protocol/algorithm variants; (3) P2P bandwidth and latency tests via CUDA samples across NVLink and PCIe. Statistical analysis using z-scores identifies minor per-GPU performance asymmetries attributable to NVSwitch topology rather than systemic bottlenecks. NVLink provides 14-15x bandwidth improvement over PCIe-only communication

Found an issue? Give us feedback