A robust technique to make a 2D advection solver tolerant to soft faults

Date

Authors

Strazdins, Peter
Harding, Brendan
Lee, Chung
Mayo, Jackson R.
Ray, Jaideep
Armstrong, Robert C.

Journal Title

Journal ISSN

Volume Title

Publisher

Access Statement

Research Projects

Organizational Units

Journal Issue

Abstract

We present a general technique to solve Partial Differential Equations, called robust stencils, which make them tolerant to soft faults, i.e. bit flips arising in memory or CPU calculations. We show how it can be applied to a two-dimensional Lax-Wendroff solver. The resulting 2D robust stencils are derived using an orthogonal application of their 1D counterparts. Combinations of 3 to 5 base stencils can then be created. We describe how these are then implemented in a parallel advection solver. Various robust stencil combinations are explored, representing tradeoff between performance and robustness. The results indicate that the 3-stencil robust combinations are slightly faster on large parallel workloads than Triple Modular Redundancy (TMR). They also have one third of the memory footprint. We expect the improvement to be significant if suitable optimizations are performed. Because faults are avoided each time new points are computed, the proposed stencils are also comparably robust to faults as TMR for a large range of error rates. The technique can be generalized to 3D (or higher dimensions) with similar benefits.

Description

Citation

Source

Procedia Computer Science

Book Title

Entity type

Publication

Access Statement

License Rights

Restricted until