
Clustered manycore architectures fitted with a Network-on-Chip (NoC) and scratchpad memories enable highly energy-efficient and time-predictable implementations. However, porting applications to such processors represents a programming challenge. Inspired by supercomputer one-sided communication libraries and by OpenCL async_work_group_copy primitives, we propose a simple programming layer for communication and synchronization on clustered manycore architectures. We discuss the design and implementation of this layer on the 2nd-generation Kalray MPPA processor, where it is available from both OpenCL and POSIX C/C++ multithreaded programming models. Our measurements show that it allows to reach up to 94% of the theoretical hardware throughput with a best-case latency round-trip of 2.2μs when operating at 500 MHz.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 7 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
