Ali, Muhammad; Southern, J.; Strazdins, Peter; Harding, Brendan
A fault-tolerant version of Open Message Passing Interface (Open MPI), based on the draft User Level Failure Mitigation (ULFM) proposal of the MPI Forum's Fault Tolerance Working Group, is used to create fault-tolerant applications. This allows applications and libraries to design their own recovery methods and control them at the user level. However, only a limited amount of research work on user level failure recovery (including the implementation and performance evaluation of this prototype)...[Show more]
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.