Actions
  • shareshare
  • link
  • cite
  • add
add
auto_awesome_motion View all 4 versions
Publication . Conference object . 2021

Testing the Divergence Stack Memory on GPGPUs: A Modular in-Field Test Strategy

Josie E. Rodriguez Condia; M. Sonza Reorda;
Open Access
Published: 10 Feb 2021
Publisher: IEEE
Country: Italy
Abstract
General Purpose Graphic Processing Units (GPGPUs) are becoming a promising solution in safety-critical applications, e.g., in the automotive domain. In these applications, reliability and functional safety are relevant factors in the selection of devices to build the systems. Nowadays, many challenges are impacting the implementation of high-performance devices, such as GPGPUs. Moreover, there is the need for effective fault detection solutions to guarantee the correct in-field operation of a GPGPU, such as in the branch management unit, which is one of the most critical modules in this parallel architecture. Faults affecting this structure can heavily corrupt or even collapse the execution of an application on the GPGPU. In this work, we propose a non-invasive Software-Based Self-Test (SBST) solution to detect faults affecting the memory in the branch management unit of a GPGPU. We propose a scalar and modular mechanism to develop the test program as a combination of software functions. The FlexGripPlus model was employed to evaluate the proposed strategies experimentally. Results show that the proposed strategies are effective to test the target structure and detect up to 98% of permanent faults. General Purpose Graphic Processing Units (GPGPUs) are becoming a promising solution in safety-critical applications, e.g., in the automotive domain. In these applications, reliability and functional safety are relevant factors in the selection of devices to build the systems. Nowadays, many challenges are impacting the implementation of high-performance devices, such as GPGPUs. Moreover, there is the need for effective fault detection solutions to guarantee the correct in-field operation of a GPGPU, such as in the branch management unit, which is one of the most critical modules in this parallel architecture. Faults affecting this structure can heavily corrupt or even collapse the execution of an application on the GPGPU. In this work, we propose a non-invasive Software-Based Self-Test (SBST) solution to detect faults affecting the memory in the branch management unit of a GPGPU. We propose a scalar and modular mechanism to develop the test program as a combination of software functions. The FlexGripPlus model was employed to evaluate the proposed strategies experimentally. Results show that the proposed strategies are effective to test the target structure and detect up to 98% of permanent faults.
Subjects by Vocabulary

Microsoft Academic Graph classification: Fault detection and isolation Computer science Software business.industry business Domain (software engineering) Modular design Embedded system Test strategy General-purpose computing on graphics processing units Stack-based memory allocation Functional safety

ACM Computing Classification System: Hardware_INTEGRATEDCIRCUITS ComputingMethodologies_DOCUMENTANDTEXTPROCESSING

Subjects

Divergence Stack Memory; General Purpose Graphics Processing Units (GPGPUs) Software-Based Self-Test (SBST), Divergence Stack Memory, General Purpose Graphics Processing Units (GPGPUs) Software-Based Self-Test (SBST), General Purpose Graphics Processing Units (GPGPUs), Software-Based Self-Test (SBST)

Related Organizations
Funded by
EC| RESCUE
Project
RESCUE
Interdependent Challenges of Reliability, Security and Quality in Nanoelectronic Systems Design
  • Funder: European Commission (EC)
  • Project Code: 722325
  • Funding stream: H2020 | MSCA-ITN-ETN
Validated by funder
moresidebar