publication . Preprint . Conference object . 2017

Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing

Issaku Kanamori; Hideo Matsufuru;
Open Access English
  • Published: 05 Dec 2017
Abstract
We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of KNL, such as using intrinsics and manual prefetching, to the matrix multiplication and iterative solver algorithms. Based on the performance measured on the Oakforest-PACS system, we discuss the performance tuning on KNL as well as the code design for facili...
Subjects
arXiv: Computer Science::Hardware ArchitectureHigh Energy Physics::LatticeComputer Science::Mathematical SoftwareComputer Science::Performance
free text keywords: High Energy Physics - Lattice, Physics - Computational Physics, Computer science, Matrix multiplication, Xeon Phi, Intrinsics, Parallel computing, Solver, Massively parallel, Sparse matrix, Lattice QCD, Performance tuning

[1] For modern textbooks, e.g., T. DeGrand, and C. DeTar, “Lattice Methods for Quantum Chromodynamics” (World Scientific Pub., 2006); C. Gattringer and C. B. Lang, ”Quantum Chromodynamics on the Lattice” (Springer, 2010)

[2] J. Jeffers, J. Reinders, A. Sodani, “Intel Xeon Phi Processor High Performance Programming Knights Landing Edition” (Elsevier, 2016). [OpenAIRE]

[3] Y. Iwasaki, T.Hoshino T.Shirakawa Y.Oyanagi T.Kawai “QCDPAX: A parallel computer for lattice QCD simulation”, Comp. Phys. Commun. 49, 449 (1988).

[4] P.A. Boyle et al., “QCDOC: A 10 Teraflops Computer for TightlyCoupled Calculations”, SC '04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, DOI: 10.1109/SC.2004.46.

[5] Joint Center for Advanced High Performance Computing (JCAHPC), https://ofp-www.jcahpc.jp/.

[6] Bridge++ project, http://bridge.kek.jp/Lattice-code/.

[7] S. Ueda et al., “Bridge++: an object-oriented C++ code for lattice simulations” PoS LATTICE2013, 412 (2014).

[8] S. Motoki et al., “Development of Lattice QCD Simulation Code Set on Accelerators” Procedia Computer Science 29, 1701 (2014). H. Matsufuru et al., “OpenCL vs OpenACC: Lessons from Development of Lattice QCD Simulation Code” Procedia Computer Science 51, 1313 (2015).

[9] QPhiX library, https://github.com/JeffersonLab/qphix.

[10] B. Joo et al., “Lattice QCD on Intel Xeon Phi Coprocessors” Supercomputing Vol. 7905 of ser. Lecture Notes in Computer Science pp 40-54 (2013).

[11] R. Li and S. Gottlieb, “Staggered Dslash Performance on Intel Xeon Phi Architecture,” PoS LATTICE 2014, 034 (2015).

[12] S. Heybrock et al., “Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors,” doi:10.1109/SC.2014.11.

[13] P. Arts et al., “QPACE 2 and Domain Decomposition on the Intel Xeon Phi,” PoS LATTICE 2014, 021 (2015).

[14] P. A. Boyle, G. Cossu, A. Yamaguchi and A. Portelli, “Grid: A next generation data parallel C++ QCD library,” PoS LATTICE 2015, 023 (2016).

Related research
Abstract
We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of KNL, such as using intrinsics and manual prefetching, to the matrix multiplication and iterative solver algorithms. Based on the performance measured on the Oakforest-PACS system, we discuss the performance tuning on KNL as well as the code design for facili...
Subjects
arXiv: Computer Science::Hardware ArchitectureHigh Energy Physics::LatticeComputer Science::Mathematical SoftwareComputer Science::Performance
free text keywords: High Energy Physics - Lattice, Physics - Computational Physics, Computer science, Matrix multiplication, Xeon Phi, Intrinsics, Parallel computing, Solver, Massively parallel, Sparse matrix, Lattice QCD, Performance tuning

[1] For modern textbooks, e.g., T. DeGrand, and C. DeTar, “Lattice Methods for Quantum Chromodynamics” (World Scientific Pub., 2006); C. Gattringer and C. B. Lang, ”Quantum Chromodynamics on the Lattice” (Springer, 2010)

[2] J. Jeffers, J. Reinders, A. Sodani, “Intel Xeon Phi Processor High Performance Programming Knights Landing Edition” (Elsevier, 2016). [OpenAIRE]

[3] Y. Iwasaki, T.Hoshino T.Shirakawa Y.Oyanagi T.Kawai “QCDPAX: A parallel computer for lattice QCD simulation”, Comp. Phys. Commun. 49, 449 (1988).

[4] P.A. Boyle et al., “QCDOC: A 10 Teraflops Computer for TightlyCoupled Calculations”, SC '04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, DOI: 10.1109/SC.2004.46.

[5] Joint Center for Advanced High Performance Computing (JCAHPC), https://ofp-www.jcahpc.jp/.

[6] Bridge++ project, http://bridge.kek.jp/Lattice-code/.

[7] S. Ueda et al., “Bridge++: an object-oriented C++ code for lattice simulations” PoS LATTICE2013, 412 (2014).

[8] S. Motoki et al., “Development of Lattice QCD Simulation Code Set on Accelerators” Procedia Computer Science 29, 1701 (2014). H. Matsufuru et al., “OpenCL vs OpenACC: Lessons from Development of Lattice QCD Simulation Code” Procedia Computer Science 51, 1313 (2015).

[9] QPhiX library, https://github.com/JeffersonLab/qphix.

[10] B. Joo et al., “Lattice QCD on Intel Xeon Phi Coprocessors” Supercomputing Vol. 7905 of ser. Lecture Notes in Computer Science pp 40-54 (2013).

[11] R. Li and S. Gottlieb, “Staggered Dslash Performance on Intel Xeon Phi Architecture,” PoS LATTICE 2014, 034 (2015).

[12] S. Heybrock et al., “Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors,” doi:10.1109/SC.2014.11.

[13] P. Arts et al., “QPACE 2 and Domain Decomposition on the Intel Xeon Phi,” PoS LATTICE 2014, 021 (2015).

[14] P. A. Boyle, G. Cossu, A. Yamaguchi and A. Portelli, “Grid: A next generation data parallel C++ QCD library,” PoS LATTICE 2015, 023 (2016).

Related research
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue