Power Consumption Optimization of GPU Server With Offline Reinforcement Learning

Name: Power Consumption Optimization of GPU Server With Offline Reinforcement Learning
Keywords: Data-driven optimization, dynamic GPU clock scaling, GPU server power management, Electrical engineering. Electronics. Nuclear engineering, offline reinforcement learning, TK1-9971

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2025Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Access, volume 13, pages 85,826-85,837 (eissn: 2169-3536,

Authors: Heechan Chung; Yeeun Im; Jongchan Park; Taeho Lee; Tae-Young Kim; Hyungjun Kim; Donghwan Lee;

doi: 10.1109/access.2025.3569803

Power Consumption Optimization of GPU Server With Offline Reinforcement Learning

- Summary
- Subjects
- Metrics

Abstract

The rapid growth of artificial intelligence (AI) has led to increased reliance on power-intensive Graphics Processing Units (GPUs), which are essential for training and deploying large-scale models. However, the escalating energy demands of AI workloads pose sustainability challenges, necessitating efficient power management strategies to reduce carbon footprints. Optimizing GPU server power consumption is complex due to the interdependence of various components. Conventional methods often involve trade-offs: increasing fan speed enhances cooling but raises overall power usage, whereas lowering GPU clock frequencies conserves energy at the cost of longer computation times. To address these challenges, we propose a data-driven optimization framework based on offline reinforcement learning (RL). Our approach collects operational data from a custom-designed workload that simulates varying server loads, capturing key metrics such as power consumption, temperature, and core frequency. The reward function balances power efficiency with performance. The reinforcement learning agent learns from pre-collected server logs, enabling intelligent real-time GPU clock control decisions without costly live experiments. Additionally, periodic fan speed adjustments and pre-training of the Q-network further enhance overall efficiency. Experimental results demonstrate that our method reduces power consumption by 3.62% while improving computation time by 1.51% for synthetic workloads. For LLaMA-2 fine-tuning, power consumption decreases by 6.40% with only a minor 1.27% increase in computation time, demonstrating its practical effectiveness. Our framework was validated on the latest NVIDIA L40S GPU, demonstrating its compatibility with cutting-edge hardware.

Related Organizations

Korean Association Of Science and Technology Studies
Korea (Republic of)

Keywords

Data-driven optimization, dynamic GPU clock scaling, GPU server power management, Electrical engineering. Electronics. Nuclear engineering, offline reinforcement learning, TK1-9971

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

gold