
arXiv: 1503.08809
We present a case study describing efforts to optimise and modernise "Modal", the simulation and analysis pipeline used by the Planck satellite experiment for constraining general non-Gaussian models of the early universe via the bispectrum (or three-point correlator) of the cosmic microwave background radiation. We focus on one particular element of the code: the projection of bispectra from the end of inflation to the spherical shell at decoupling, which defines the CMB we observe today. This code involves a three-dimensional inner product between two functions, one of which requires an integral, on a non-rectangular domain containing a sparse grid. We show that by employing separable methods this calculation can be reduced to a one-dimensional summation plus two integrations, reducing the overall dimensionality from four to three. The introduction of separable functions also solves the issue of the non-rectangular sparse grid. This separable method can become unstable in certain cases and so the slower non-separable integral must be calculated instead. We present a discussion of the optimisation of both approaches. We show significant speed-ups of ~100x, arising from a combination of algorithmic improvements and architecture-aware optimisations targeted at improving thread and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of executing on clusters containing processors and/or coprocessors, with strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single coprocessor outperforms two processor sockets by a factor of 1.3x and that running the same code across a combination of both microarchitectures improves performance-per-node by a factor of 3.38x. By making bispectrum calculations competitive with those for the power spectrum (or two-point correlator) we are now able to consider joint analysis for cosmological science exploitation of new data.
Accepted by Journal of Computational Physics
FOS: Computer and information sciences, Computer Science - Performance, Cosmology and Nongalactic Astrophysics (astro-ph.CO), Physics and Astronomy (miscellaneous), FOS: Physical sciences, Cosmology, Many-core, Computer Science Applications, Performance (cs.PF), Xeon Phi, Computer Science - Distributed, Parallel, and Cluster Computing, Software, source code, etc. for problems pertaining to astronomy and astrophysics, Nested parallelism, Packaged methods for numerical algorithms, Distributed, Parallel, and Cluster Computing (cs.DC), cosmology, xeon phi, many-core, nested parallelism, 4006 Communications Engineering, 40 Engineering, Astrophysics - Cosmology and Nongalactic Astrophysics
FOS: Computer and information sciences, Computer Science - Performance, Cosmology and Nongalactic Astrophysics (astro-ph.CO), Physics and Astronomy (miscellaneous), FOS: Physical sciences, Cosmology, Many-core, Computer Science Applications, Performance (cs.PF), Xeon Phi, Computer Science - Distributed, Parallel, and Cluster Computing, Software, source code, etc. for problems pertaining to astronomy and astrophysics, Nested parallelism, Packaged methods for numerical algorithms, Distributed, Parallel, and Cluster Computing (cs.DC), cosmology, xeon phi, many-core, nested parallelism, 4006 Communications Engineering, 40 Engineering, Astrophysics - Cosmology and Nongalactic Astrophysics
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
