Name: Power- and Fragmentation-Aware Online Scheduling for GPU Datacenters
Keywords: GPU Datacenter, Power aware Scheduling, GPU sharing, GPU Fragmentation, Online Scheduling, Green Computing, Sustainable Computing, FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Distributed, Parallel, and Cluster Computing (cs.DC)

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 19 May 2025Embargo end date: 01 Jan 2024Publisher:IEEEJournal:2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

Authors: Lettich F.; Carlini E.; Nardini F. M.; Perego R.; Trani S.;

doi: 10.1109/ccgrid64434.2025.00015 , 10.5281/zenodo.15469410 , 10.5281/zenodo.15469411 , 10.48550/arxiv.2412.17484

arXiv: http://arxiv.org/abs/2412.17484

handle: 20.500.14243/549401

Power- and Fragmentation-Aware Online Scheduling for GPU Datacenters

- Summary
- Subjects
- Metrics

Abstract

The rise of Artificial Intelligence and Large Language Models is driving increased GPU usage in data centers for complex training and inference tasks, impacting operational costs, energy demands, and the environmental footprint of large-scale computing infrastructures. This work addresses the online scheduling problem in GPU datacenters, which involves scheduling tasks without knowledge of their future arrivals. We focus on two objectives: minimizing GPU fragmentation and reducing power consumption. GPU fragmentation occurs when partial GPU allocations hinder the efficient use of remaining resources, especially as the datacenter nears full capacity. A recent scheduling policy, Fragmentation Gradient Descent (FGD), leverages a fragmentation metric to address this issue. Reducing power consumption is also crucial due to the significant power demands of GPUs. To this end, we propose PWR, a novel scheduling policy to minimize power usage by selecting power-efficient GPU and CPU combinations. This involves a simplified model for measuring power consumption integrated into a Kubernetes score plugin. Through an extensive experimental evaluation in a simulated cluster, we show how PWR, when combined with FGD, achieves a balanced trade-off between reducing power consumption and minimizing GPU fragmentation.

Related Organizations

Keywords

GPU Datacenter, Power aware Scheduling, GPU sharing, GPU Fragmentation, Online Scheduling, Green Computing, Sustainable Computing, FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Distributed, Parallel, and Cluster Computing (cs.DC)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green