Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao ACM Transactions on ...arrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
DBLP
Article . 2025
Data sources: DBLP
versions View all 2 versions
addClaim

Thermal Management for FPGA Nodes in HPC Systems

Authors: Yingyi Luo; Joshua C. Zhao; Arnav Aggarwal; Seda Ogrenci-Memik; Kazutomo Yoshii;

Thermal Management for FPGA Nodes in HPC Systems

Abstract

The integration of FPGAs into large-scale computing systems is gaining attention. In these systems, real-time data handling for networking, tasks for scientific computing, and machine learning can be executed with customized datapaths on reconfigurable fabric within heterogeneous compute nodes. At the same time, thermal management, particularly battling the cooling cost and guaranteeing the reliability, is a continuing concern. The introduction of new heterogeneous components into HPC nodes only adds further complexities to thermal modeling and management. The thermal behavior of multi-FPGA systems deployed within large compute clusters is less explored. In this article, we first show that the thermal behaviors of different FPGAs of the same generation can vary due to their physical locations in a rack and process variation, even though they are running the same tasks. We present a machine learning–based model to capture the thermal behavior of each individual FPGA in the cluster. We then propose two thermal management strategies guided by our thermal model. First, we mitigate thermal variation and hotspots across the cluster by proactive thermal-aware task placement. Under the tested system and benchmarks, we achieve up to 26.4° C and on average 13.3° C system temperature reduction with no performance penalty. Second, we utilize this thermal model to guide HLS parameter tuning at the task design stage to achieve improved thermal response after deployment.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    3
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
3
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!