publication . Other literature type . Article . Preprint . 2019

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

Ang Li; Shuaiwen Leon Song; Jieyang Chen; Jiajia Li; Xu Liu; Nathan R. Tallent; Kevin J. Barker;
  • Published: 11 Mar 2019
  • Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Comment: 15 pages. The paper is going to be submitted to TPDS
Subjects
free text keywords: Signal Processing, Hardware and Architecture, Computational Theory and Mathematics, PCI Express, Distributed computing, Chipset, Performance tuning, Server, Computer science, Network topology, Cloud computing, business.industry, business, Computer architecture, Scheduling (computing), Big data, Computer Science - Hardware Architecture, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Networking and Internet Architecture, Computer Science - Performance
60 references, page 1 of 4

[1] P. Goyal, P. Dolla´r, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour,” arXiv preprint arXiv:1706.02677, 2017.

[2] O. Fuhrer, T. Chadha, T. Hoefler, G. Kwasniewski, X. Lapillonne, D. Leutwyler, D. L u¨thi, C. Osuna, C. Scha¨r, T. C. Schulthess et al., “Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 gpus with cosmo 5.0,” Geoscientific Model Development, 2018.

[3] H. Mikami, H. Suganuma, P. U-chupala, Y. Tanaka, and Y. Kageyama, “Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash,” arXiv preprint arXiv:1811.05233, 2018.

[4] NVIDIA, “NVIDIA DGX-1 System Architecture White Paper,” 2017.

[5] NVIDIA, “NVIDIA DGX-2H The World's Most Powerful System for The Most Complex AI Challenges,” 2018.

[6] OLCF, “Summit: The Next Leap in Leadership-Class Computing Systems for Open Science,” https://www.olcf.ornl.gov/ for-users/system-user-guides/summit/.

[7] OLCF, “Sierra: Supporting NNSA's stockpile stewardship mission through simulation in lieu of underground testing,” https: //computation.llnl.gov/computers/sierra.

[8] S. Pabst, A. Koch, and W. Straßer, “Fast and scalable cpu/gpu collision detection for rigid and deformable surfaces,” in Computer Graphics Forum. Wiley Online Library, 2010.

[9] Q. Xu, H. Jeon, and M. Annavaram, “Graph processing on GPU: Where are the bottlenecks?” in International Symposium on Workload Characterization (IISWC). IEEE, 2014.

[10] “The System Bottleneck Shifts to PCI-Express,” https://www.nextplatform.com/2017/07/14/ system-bottleneck-shifts-pci-express/.

[11] D. Ziakas, A. Baum, R. A. Maddox, and R. J. Safranek, “Intel R quickpath interconnect architectural features supporting scalable system architectures,” in High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on. IEEE, 2010. [OpenAIRE]

[12] D. Foley and J. Danskin, “Ultra-Performance Pascal GPU and NVLink Interconnect,” IEEE Micro, 2017. [OpenAIRE]

[13] NVIDIA, “SLI best practices,” Tech. Rep., 2007.

[14] AMD, “ATI CrossFire Pro User Guide,” Tech. Rep., 2009.

[15] G. Kim, M. Lee, J. Jeong, and J. Kim, “Multi-GPU system design with memory networks,” in 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2014.

60 references, page 1 of 4
Abstract
Comment: 15 pages. The paper is going to be submitted to TPDS
Subjects
free text keywords: Signal Processing, Hardware and Architecture, Computational Theory and Mathematics, PCI Express, Distributed computing, Chipset, Performance tuning, Server, Computer science, Network topology, Cloud computing, business.industry, business, Computer architecture, Scheduling (computing), Big data, Computer Science - Hardware Architecture, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Networking and Internet Architecture, Computer Science - Performance
60 references, page 1 of 4

[1] P. Goyal, P. Dolla´r, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He, “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour,” arXiv preprint arXiv:1706.02677, 2017.

[2] O. Fuhrer, T. Chadha, T. Hoefler, G. Kwasniewski, X. Lapillonne, D. Leutwyler, D. L u¨thi, C. Osuna, C. Scha¨r, T. C. Schulthess et al., “Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 gpus with cosmo 5.0,” Geoscientific Model Development, 2018.

[3] H. Mikami, H. Suganuma, P. U-chupala, Y. Tanaka, and Y. Kageyama, “Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash,” arXiv preprint arXiv:1811.05233, 2018.

[4] NVIDIA, “NVIDIA DGX-1 System Architecture White Paper,” 2017.

[5] NVIDIA, “NVIDIA DGX-2H The World's Most Powerful System for The Most Complex AI Challenges,” 2018.

[6] OLCF, “Summit: The Next Leap in Leadership-Class Computing Systems for Open Science,” https://www.olcf.ornl.gov/ for-users/system-user-guides/summit/.

[7] OLCF, “Sierra: Supporting NNSA's stockpile stewardship mission through simulation in lieu of underground testing,” https: //computation.llnl.gov/computers/sierra.

[8] S. Pabst, A. Koch, and W. Straßer, “Fast and scalable cpu/gpu collision detection for rigid and deformable surfaces,” in Computer Graphics Forum. Wiley Online Library, 2010.

[9] Q. Xu, H. Jeon, and M. Annavaram, “Graph processing on GPU: Where are the bottlenecks?” in International Symposium on Workload Characterization (IISWC). IEEE, 2014.

[10] “The System Bottleneck Shifts to PCI-Express,” https://www.nextplatform.com/2017/07/14/ system-bottleneck-shifts-pci-express/.

[11] D. Ziakas, A. Baum, R. A. Maddox, and R. J. Safranek, “Intel R quickpath interconnect architectural features supporting scalable system architectures,” in High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on. IEEE, 2010. [OpenAIRE]

[12] D. Foley and J. Danskin, “Ultra-Performance Pascal GPU and NVLink Interconnect,” IEEE Micro, 2017. [OpenAIRE]

[13] NVIDIA, “SLI best practices,” Tech. Rep., 2007.

[14] AMD, “ATI CrossFire Pro User Guide,” Tech. Rep., 2009.

[15] G. Kim, M. Lee, J. Jeong, and J. Kim, “Multi-GPU system design with memory networks,” in 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2014.

60 references, page 1 of 4
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue
publication . Other literature type . Article . Preprint . 2019

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

Ang Li; Shuaiwen Leon Song; Jieyang Chen; Jiajia Li; Xu Liu; Nathan R. Tallent; Kevin J. Barker;