Views provided by UsageCounts
handle: 2117/107374
Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the possibility of expressing dependences among tasks to drive their execution at runtime. Managing these dependences introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we present a general purpose hardware accelerator, Picos++, to manage the inter-task dependences efficiently in both time and energy. Our design also includes a novel nested task support. To this end, a new hardware/software co-design is presented to overcome the fact that nested tasks with dependences could result in system deadlocks due to the limited amount of resources in hardware task dependence managers. In this paper we describe a detailed implementation of this design and evaluate a parallel task-based programming model using Picos++ in a Linux embedded system with two ARM Cortex-A9 and a FPGA. The scalability and energy consumption of the real system implemented have been studied and compared against a software runtime. Even in a system limited to 2 threads, using Picos++ results in more than 1.8x speedup and 40% of energy savings in the most demanding parallelizations of real benchmarks. As a matter of fact, a hardware task dependence manager should be able to achieve much higher speedup and provide more energy savings with more threads.
This work is supported by the Spanish Government (projects SEV-2015-0493 and TIN2015-65316-P), by the Generalitat de Catalunya (2014-SGR-1051 and 2014-SGR- 1272), by the European Research Council (RoMoL GA 321253) and by the “Port of OmpSs to the Android platform and Hardware support for Nanos++ runtime” Project Cooperation Agreement with LG Electronics. We also thank the Xilinx University Program.
Peer Reviewed
Instruction sets, Hardware, Runtime, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles, Programming, Parallel processing, Parallel programming (Computer science), Programació en paral·lel (Informàtica), :Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC], Discrete cosine transforms
Instruction sets, Hardware, Runtime, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles, Programming, Parallel processing, Parallel programming (Computer science), Programació en paral·lel (Informàtica), :Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC], Discrete cosine transforms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 7 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 35 |

Views provided by UsageCounts