Skip navigation
Skip navigation

Application level fault recovery: Using fault-tolerant open MPI in a PDE solver

Ali, Muhammad; Southern, J.; Strazdins, Peter; Harding, Brendan


A fault-tolerant version of Open Message Passing Interface (Open MPI), based on the draft User Level Failure Mitigation (ULFM) proposal of the MPI Forum's Fault Tolerance Working Group, is used to create fault-tolerant applications. This allows applications and libraries to design their own recovery methods and control them at the user level. However, only a limited amount of research work on user level failure recovery (including the implementation and performance evaluation of this prototype)...[Show more]

CollectionsANU Research Publications
Date published: 2014
Type: Conference paper
Source: Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS
DOI: 10.1109/IPDPSW.2014.132


File Description SizeFormat Image
01_Ali_Application_level_fault_2014.pdf545.39 kBAdobe PDF    Request a copy

Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator