publication . Preprint . 2013

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Hofmann, Johannes; Treibig, Jan; Hager, Georg; Wellein, Gerhard;
Open Access English
  • Published: 17 Dec 2013
Abstract
We examine the Xeon Phi, which is based on Intel's Many Integrated Cores architecture, for its suitability to run the FDK algorithm--the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite the lack of a shared last level cache. Apart from parallelization, SIMD vectorization is critical for good performance on the Xeon Phi; we perform various micro-benchmarks to investigate the platform's new set of vector instructions and put a special emphasis on the newly introduced vector gathe...
Subjects
free text keywords: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Performance
Download from
16 references, page 1 of 2

[1] B. Heigl and M. Kowarschik, “High-speed reconstruction for C-arm computed tomography,” in In Proceedings Fully 3D Meeting and HPIR Workshop, July 2007, pp. 25-28.

[2] G. Pratx and L. Xing, “Gpu computing in medical physics: A review,” Medical Physics, vol. 38, no. 5, pp. 2685-2697, 2011. [Online]. Available: http://link.aip.org/link/?MPH/38/2685/1

[3] T. Zinsser and B. Keck, “Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture,” in Proceedings of the 12th Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine, F. committee, Ed., 2013, p. 225228.

[4] J. Treibig, G. Hager, H. G. Hofmann, J. Hornegger, and G. Wellein, “Pushing the limits for medical image reconstruction on recent standard multicore processors,” International Journal of High Performance Computing Applications, 2012, (Accepted). [Online]. Available: http://arxiv.org/abs/1104.5243

[5] C. Rohkohl, B. Keck, H. Hofmann, and J. Hornegger, “RabbitCT - an open platform for benchmarking 3D cone-beam reconstruction algorithms,” Medical Physics, vol. 36, no. 9, pp. 3940-3944, 2009.

[6] M. Kachelriess, M. Knaup, and O. Bockenbach, “Hyperfast parallelbeam and cone-beam backprojection using the cell general purpose hardware.” Med Phys, vol. 34, no. 4, pp. 1474- 86, 2007. [Online]. Available: Http://www.biomedsearch.com/nih/ Hyperfast-parallel-beam-cone-backprojection/17500478.html [OpenAIRE]

[7] H. Scherl, M. Kowarschik, H. G. Hofmann, B. Keck, and J. Hornegger, “Evaluation of state-of-the-art hardware architectures for fast cone-beam ct reconstruction,” Parallel Comput., vol. 38, no. 3, pp. 111-124, Mar. 2012.

[8] “Intel Xeon Phi Coprocessor Vector Microarchitecture.” [Online]. Available: http://software.intel.com/sites/default/files/article/ 393199/intel-xeon-phi-coprocessor-vector-microarchitecture.pdf

[9] J. Treibig, G. Hager, and G. Wellein, “LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments,” in PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures. Los Alamitos, CA, USA: IEEE Computer Society, 2010, pp. 207-216. [Online]. Available: http: //dx.doi.org/10.1109/ICPPW.2010.38 [OpenAIRE]

[10] Intel Corporation, System V Application Binary Interface - K1OM Architecture Processor Supplement, April 2012.

[11] S. W. Williams, A. Waterman, and D. A. Patterson, “Roofline: An insightful visual performance model for floating-point programs and multicore architectures,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2008-134, Oct 2008.

[12] J. Treibig and G. Hager, “Introducing a performance model for bandwidth-limited loop kernels,” in Parallel Processing and Applied Mathematics, ser. Lecture Notes in Computer Science, R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, Eds. Springer Berlin / Heidelberg, 2010, vol. 6067, pp. 615-624.

[13] “Intel architecture code analyzer.” [Online]. Available: http://software. intel.com/en-us/articles/intel-architecture-code-analyzer/

[14] OpenMP Architecture Review Board, OpenMP Application Program Interface - Version 4.0, July 2013.

[15] M. Pharr and W. R. Mark, “ispc: A SPMD Compiler for HighPerformance CPU Programming,” in In Proceedings Innovative Parallel Computing (InPar), San Jose, CA, May 2012. [OpenAIRE]

16 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue