Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ IEEE Accessarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article . 2025 . Peer-reviewed
License: CC BY NC ND
Data sources: Crossref
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article . 2025
Data sources: DOAJ
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Power Consumption Optimization of GPU Server With Offline Reinforcement Learning

Authors: Heechan Chung; Yeeun Im; Jongchan Park; Taeho Lee; Tae-Young Kim; Hyungjun Kim; Donghwan Lee;

Power Consumption Optimization of GPU Server With Offline Reinforcement Learning

Abstract

The rapid growth of artificial intelligence (AI) has led to increased reliance on power-intensive Graphics Processing Units (GPUs), which are essential for training and deploying large-scale models. However, the escalating energy demands of AI workloads pose sustainability challenges, necessitating efficient power management strategies to reduce carbon footprints. Optimizing GPU server power consumption is complex due to the interdependence of various components. Conventional methods often involve trade-offs: increasing fan speed enhances cooling but raises overall power usage, whereas lowering GPU clock frequencies conserves energy at the cost of longer computation times. To address these challenges, we propose a data-driven optimization framework based on offline reinforcement learning (RL). Our approach collects operational data from a custom-designed workload that simulates varying server loads, capturing key metrics such as power consumption, temperature, and core frequency. The reward function balances power efficiency with performance. The reinforcement learning agent learns from pre-collected server logs, enabling intelligent real-time GPU clock control decisions without costly live experiments. Additionally, periodic fan speed adjustments and pre-training of the Q-network further enhance overall efficiency. Experimental results demonstrate that our method reduces power consumption by 3.62% while improving computation time by 1.51% for synthetic workloads. For LLaMA-2 fine-tuning, power consumption decreases by 6.40% with only a minor 1.27% increase in computation time, demonstrating its practical effectiveness. Our framework was validated on the latest NVIDIA L40S GPU, demonstrating its compatibility with cutting-edge hardware.

Related Organizations
Keywords

Data-driven optimization, dynamic GPU clock scaling, GPU server power management, Electrical engineering. Electronics. Nuclear engineering, offline reinforcement learning, TK1-9971

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
gold