Enabling Software Resilience in GPGPU Applications via Partial Thread Protection (ICSE 2021 - Technical Track)

Who

Lishan Yang, Bin Nie, Adwait Jog, Evgenia Smirni

Track

ICSE 2021 Technical Track

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 28 May 2021 20:10 - 20:30 at Blended Sessions Room 3 - 4.5.3. Programming: Low Level Chair(s): Ignacio Panach
Sat 29 May 2021 08:10 - 08:30 at Blended Sessions Room 3 - 4.5.3. Programming: Low Level

Abstract

Graphic Processing Units (GPUs) are widely used by various applications in a wide variety of fields to accelerate their computation but remain susceptible to soft errors that can easily compromise application output. By taking advantage of the application hierarchical organization in threads, warps, and cooperative thread arrays, we propose a framework that identifies the resilience of threads and aims to map threads with the same resilience characteristics to the same warp. This allows to engage replication mechanisms for error detection/correction at the warp level. By exploring 12 benchmarks (17 kernels) from 4 benchmarks suites, we illustrate that threads can be remapped into reliable or unreliable warps with only 1.63% introduced overhead (on average), and then selectively protect those groups of threads that truly need it. Furthermore, we show that remapping to different warps does not sacrifice application performance, surprisingly it even improves execution in some cases. In addition, we show how this remapping facilitates warp replication for error detection and/or correction and achieves average savings of 20.61% and 27.15% execution cycles, respectively comparing to standard duplication/triplication.

Link to Preprint

https://arxiv.org/abs/2103.02825

Lishan Yang

William & Mary

Bin Nie

William & Mary

Adwait Jog

William & Mary

Evgenia Smirni

William & Mary

Media

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 28 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

19:30 - 20:30	4.5.3. Programming: Low LevelTechnical Track / Journal-First Papers at Blended Sessions Room 3 +12h Chair(s): Ignacio Panach Universidad de Valencia

19:30 20m Paper		A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVMJournal-First Journal-First Papers Long Zhang KTH Royal Institute of Technology, Brice Morin SINTEF, Philipp Haller KTH, Benoit Baudry KTH Royal Institute of Technology, Martin Monperrus KTH Royal Institute of Technology Link to publication DOI Pre-print Media Attached
19:50 20m Paper		Interface Compliance of Inline Assembly: Automatically Check, Patch and RefineACM SIGSOFT Distinguished PaperTechnical Track Technical Track Frédéric Recoules CEA, List, Sébastien Bardin CEA LIST, University Paris-Saclay, France, Richard Bonichon Tweag I/O, Paris, France, Matthieu Lemerre CEA LIST, University Paris-Saclay, France, Laurent Mounier Univ. Grenoble Alpes. VERIMAG, Grenoble, France, Marie-Laure Potet Univ. Grenoble Alpes. VERIMAG, Grenoble, France Pre-print Media Attached
20:10 20m Paper		Enabling Software Resilience in GPGPU Applications via Partial Thread ProtectionTechnical Track Technical Track Lishan Yang William & Mary, Bin Nie William & Mary, Adwait Jog William & Mary, Evgenia Smirni William & Mary Pre-print Media Attached

Sat 29 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

07:30 - 08:30	4.5.3. Programming: Low LevelTechnical Track / Journal-First Papers at Blended Sessions Room 3

07:30 20m Paper		A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVMJournal-First Journal-First Papers Long Zhang KTH Royal Institute of Technology, Brice Morin SINTEF, Philipp Haller KTH, Benoit Baudry KTH Royal Institute of Technology, Martin Monperrus KTH Royal Institute of Technology Link to publication DOI Pre-print Media Attached
07:50 20m Paper		Interface Compliance of Inline Assembly: Automatically Check, Patch and RefineACM SIGSOFT Distinguished PaperTechnical Track Technical Track Frédéric Recoules CEA, List, Sébastien Bardin CEA LIST, University Paris-Saclay, France, Richard Bonichon Tweag I/O, Paris, France, Matthieu Lemerre CEA LIST, University Paris-Saclay, France, Laurent Mounier Univ. Grenoble Alpes. VERIMAG, Grenoble, France, Marie-Laure Potet Univ. Grenoble Alpes. VERIMAG, Grenoble, France Pre-print Media Attached
08:10 20m Paper		Enabling Software Resilience in GPGPU Applications via Partial Thread ProtectionTechnical Track Technical Track Lishan Yang William & Mary, Bin Nie William & Mary, Adwait Jog William & Mary, Evgenia Smirni William & Mary Pre-print Media Attached