Downloads provided by UsageCounts
arXiv: 2204.10768
handle: 2117/422575
Future Exascale systems will feature massive parallelism, many-core processors and heterogeneous architectures. In this scenario, it is increasingly difficult for HPC applications to fully and efficiently utilize the resources in system nodes. Moreover, the increased parallelism exacerbates the effects of existing inefficiencies in current applications. Research has shown that co-scheduling applications to share system nodes instead of executing each application exclusively can increase resource utilization and efficiency. Nevertheless, the current oversubscription and co-location techniques to share nodes have several drawbacks which limit their applicability and make them very application-dependent. This paper presents co-execution through system-wide scheduling. Co-execution is a novel fine-grained technique to execute multiple HPC applications simultaneously on the same node, outperforming current state-of-the-art approaches. We implement this technique in nOS-V, a lightweight tasking library that supports co-execution through system-wide task scheduling. Moreover, nOS-V can be easily integrated with existing programming models, requiring no changes to user applications. We showcase how co-execution with nOS-V significantly reduces schedule makespan for several applications on single node and distributed environments, outperforming prior node-sharing techniques.
12 pages, 10 figures. Submitted to the Journal of Parallel and Distributed Computing
FOS: Computer and information sciences, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles, Computer Science - Distributed, Parallel, and Cluster Computing, HPC, Parallel programming, Distributed, Parallel, and Cluster Computing (cs.DC), Co-execution, Co-location, Task-based programming
FOS: Computer and information sciences, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles, Computer Science - Distributed, Parallel, and Cluster Computing, HPC, Parallel programming, Distributed, Parallel, and Cluster Computing (cs.DC), Co-execution, Co-location, Task-based programming
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 13 | |
| downloads | 8 |

Views provided by UsageCounts
Downloads provided by UsageCounts