
The Barnes-Hut approximation for N-body simula tions reduces the time complexity of the naive all-pairs approach from O(N 2 ) to O(N log N) by hierarchically aggregating nearby particles into single entities using a tree data structure. This inherently irregular algorithm poses substantial challenges for performance portable implementations on multi-core CPUs and GPUs. We introduce two portable fully-parallel Barnes-Hut implementation strategies that trade-off different levels of GPU support for performance: an unbalanced concurrent octree, and a balanced bounding volume hierarchy sorted by a Hilbert space filling curve. We implemente these algorithms in portable ISO C++ using parallel algorithms and concurrency primitives like atomics. The results demonstrate competitive performance on a range of CPUs and GPUs. Additionally, they highlight the effectiveness of the par execution policy for highly concurrent irregular algorithms, outperforming par_unseq on CPUs and GPUs with Independent Forward Progress.
000, 004
000, 004
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
