<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
We present a case study describing efforts to optimise and modernise "Modal", the simulation and analysis pipeline used by the Planck satellite experiment for constraining general non-Gaussian models of the early universe via the bispectrum (or three-point correlator) of the cosmic microwave background radiation. We focus on one particular element of the code: the projection of bispectra from the end of inflation to the spherical shell at decoupling, which defines the CMB we observe today. This code involves a three-dimensional inner product between two functions, one of which requires an integral, on a non-rectangular domain containing a sparse grid. We show that by employing separable methods this calculation can be reduced to a one-dimensional summation plus two integrations, reducing the overall dimensionality from four to three. The introduction of separable functions also solves the issue of the non-rectangular sparse grid. This separable method can become unstable in certain cases and so the slower non-separable integral must be calculated instead. We present a discussion of the optimisation of both approaches. We show significant speed-ups of ~100x, arising from a combination of algorithmic improvements and architecture-aware optimisations targeted at improving thread and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of executing on clusters containing processors and/or coprocessors, with strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single coprocessor outperforms two processor sockets by a factor of 1.3x and that running the same code across a combination of both microarchitectures improves performance-per-node by a factor of 3.38x. By making bispectrum calculations competitive with those for the power spectrum (or two-point correlator) we are now able to consider joint analysis for cosmological science exploitation of new data.
Accepted by Journal of Computational Physics
FOS: Computer and information sciences, Computer Science - Performance, Cosmology and Nongalactic Astrophysics (astro-ph.CO), Physics and Astronomy (miscellaneous), FOS: Physical sciences, Cosmology, Many-core, Computer Science Applications, Performance (cs.PF), Xeon Phi, Computer Science - Distributed, Parallel, and Cluster Computing, Nested parallelism, Distributed, Parallel, and Cluster Computing (cs.DC), Astrophysics - Cosmology and Nongalactic Astrophysics
FOS: Computer and information sciences, Computer Science - Performance, Cosmology and Nongalactic Astrophysics (astro-ph.CO), Physics and Astronomy (miscellaneous), FOS: Physical sciences, Cosmology, Many-core, Computer Science Applications, Performance (cs.PF), Xeon Phi, Computer Science - Distributed, Parallel, and Cluster Computing, Nested parallelism, Distributed, Parallel, and Cluster Computing (cs.DC), Astrophysics - Cosmology and Nongalactic Astrophysics
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |