publication . Preprint . 2013

An Empirical Study of Intel Xeon Phi

Fang, Jianbin; Varbanescu, Ana Lucia; Sips, Henk; Zhang, Lilun; Che, Yonggang; Xu, Chuanfu;
Open Access English
  • Published: 22 Oct 2013
Abstract
With at least 50 cores, Intel Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility - it can be used both as a coprocessor or as a stand-alone processor - are very tempting for parallel applications looking for new performance records. In this paper, we present an empirical study of Xeon Phi, stressing its performance limits and relevant performance factors, ultimately aiming to present a simplified view of the machine for regular programmers in search for performance. To do so...
Subjects
free text keywords: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance
Download from
30 references, page 1 of 2

[1] Intel, “Intel Xeon Phi Coprocessor.” http://software.intel.com/en-us/mic-developer, April 2013.

[2] V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, “Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU,” SIGARCH Comput. Archit. News, vol. 38, pp. 451-460, June 2010.

[3] Intel, Intel Xeon Phi Coprocessor System Software Development Guide, Nov. 2012.

[4] H. Wong, M.-M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos, “Demystifying GPU microarchitecture through microbenchmarking,” in 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp. 235-246, IEEE, Mar. 2010. [OpenAIRE]

[5] Intel, An Overview of Programming for Intel Xeon Processors and Intel Xeon Phi Coprocessors, Oct. 2012.

[6] David, Programming with POSIX Threads. Addison-Wesley Professional, May 1997.

[7] O. A. R. Board, “OpenMP application program interface (version 4.0),” tech. rep., July 2013.

[8] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, “Cilk: an efficient multithreaded runtime system,” SIGPLAN Not., vol. 30, pp. 207-216, Aug. 1995. [OpenAIRE]

[9] J. E. Stone, D. Gohara, and G. Shi, “OpenCL: A parallel programming standard for heterogeneous computing systems.,” Computing in science & engineering, vol. 12, pp. 66-72, May 2010.

[10] K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick, “A view of the parallel computing landscape,” Commun. ACM, vol. 52, pp. 56-67, Oct. 2009. [OpenAIRE]

[11] S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful visual performance model for multicore architectures,” Commun. ACM, vol. 52, pp. 65-76, Apr. 2009.

[12] A. Frog, “Lists of instruction latencies, throughputs and micro-operation reakdowns for intel, AMD and VIA CPUs,” tech. rep., Copenhagen University, Feb. 2012.

[13] T. Granlund, “Instruction latencies and throughput for AMD and intel x86 processors,” tech. rep., KTH, Feb. 2012.

[14] Intel, Intel Xeon Phi Coprocessor InstructionSet Architecture Reference Manual, Sept. 2012.

[15] J. L. Hennessy and D. A. Patterson, Computer Architecture, Fifth Edition: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design). Morgan Kaufmann, 5 ed., Sept. 2011.

30 references, page 1 of 2
Abstract
With at least 50 cores, Intel Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility - it can be used both as a coprocessor or as a stand-alone processor - are very tempting for parallel applications looking for new performance records. In this paper, we present an empirical study of Xeon Phi, stressing its performance limits and relevant performance factors, ultimately aiming to present a simplified view of the machine for regular programmers in search for performance. To do so...
Subjects
free text keywords: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance
Download from
30 references, page 1 of 2

[1] Intel, “Intel Xeon Phi Coprocessor.” http://software.intel.com/en-us/mic-developer, April 2013.

[2] V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, “Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU,” SIGARCH Comput. Archit. News, vol. 38, pp. 451-460, June 2010.

[3] Intel, Intel Xeon Phi Coprocessor System Software Development Guide, Nov. 2012.

[4] H. Wong, M.-M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos, “Demystifying GPU microarchitecture through microbenchmarking,” in 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp. 235-246, IEEE, Mar. 2010. [OpenAIRE]

[5] Intel, An Overview of Programming for Intel Xeon Processors and Intel Xeon Phi Coprocessors, Oct. 2012.

[6] David, Programming with POSIX Threads. Addison-Wesley Professional, May 1997.

[7] O. A. R. Board, “OpenMP application program interface (version 4.0),” tech. rep., July 2013.

[8] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, “Cilk: an efficient multithreaded runtime system,” SIGPLAN Not., vol. 30, pp. 207-216, Aug. 1995. [OpenAIRE]

[9] J. E. Stone, D. Gohara, and G. Shi, “OpenCL: A parallel programming standard for heterogeneous computing systems.,” Computing in science & engineering, vol. 12, pp. 66-72, May 2010.

[10] K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick, “A view of the parallel computing landscape,” Commun. ACM, vol. 52, pp. 56-67, Oct. 2009. [OpenAIRE]

[11] S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful visual performance model for multicore architectures,” Commun. ACM, vol. 52, pp. 65-76, Apr. 2009.

[12] A. Frog, “Lists of instruction latencies, throughputs and micro-operation reakdowns for intel, AMD and VIA CPUs,” tech. rep., Copenhagen University, Feb. 2012.

[13] T. Granlund, “Instruction latencies and throughput for AMD and intel x86 processors,” tech. rep., KTH, Feb. 2012.

[14] Intel, Intel Xeon Phi Coprocessor InstructionSet Architecture Reference Manual, Sept. 2012.

[15] J. L. Hennessy and D. A. Patterson, Computer Architecture, Fifth Edition: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design). Morgan Kaufmann, 5 ed., Sept. 2011.

30 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue