We present a strategy to speed up Runge-Kutta-based ODE simulations of large systems with nearest-neighbor coupling. We identify the cache/memory bandwidth as the crucial performance bottleneck. To reduce the required bandwidth, we introduce a granularity in the simulat... View more
 K. Ahnert, D. Demidov, and M. Mulansky, Solving ordinary differential equations on GPUs, in Numerical Computations with GPUs, Springer, 2014, pp. 125- 157.
 K. Ahnert and M. Mulansky, odeint. http://www.odeint.com, 2009-2014.
 , odeint - Solving ordinary differential equations in C++, in Symposium on the Numerical Solution of Differential Eq. and their Applications, AIP Conference Proceedings, 2011.
 G. Bordyugov, A. Pikovsky, and M. Rosenblum, Self-emerging and turbulent chimeras in oscillator chains, Physical Review E, 82 (2010), p. 035205.
 P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes, Cache hierarchy and memory subsystem of the amd opteron processor, IEEE micro, 30 (2010), pp. 16-29.
 D. Demidov, K. Ahnert, K. Rupp, and P. Gottschling, Programming CUDA and OpenCL: A case study using modern C++ libraries, SIAM Journal on Scientific Computing, 35(5) (2013), pp. C453-C472.
 S. Dindar, E.B. Ford, M. Juric, Y.I. Yeo, J. Gao, A.C. Boley, B. Nelson, and J. Peters, Swarm-NG: A CUDA library for parallel n-body integrations with focus on simulations of planetary systems, New Astronomy, 23 (2013), pp. 6-18.
 C. Ding and K. Kennedy, The memory of bandwidth bottleneck and its amelioration by a compiler, in Proceedings of the 14th International Symposium on Parallel and Distributed Processing IPDPS., IEEE, 2000, pp. 181-189.
 P. Est´erie, J. Falcou, M. Gaunard, and J.-T. Laprest´e, Boost.SIMD: generic programming for portable simdization, in Proceedings of the 2014 Workshop on programming models for SIMD/Vector processing, ACM, 2014, pp. 1-8.
 S. Flach, D.O. Krimer, and C. Skokos, Universal spreading of wave packets in disordered nonlinear systems, Physical Review Letters, 102 (2009), p. 024101.