Skip navigation
Skip navigation

Application level fault recovery: Using fault-tolerant open MPI in a PDE solver

Ali, Muhammad; Southern, J.; Strazdins, Peter; Harding, Brendan

Description

A fault-tolerant version of Open Message Passing Interface (Open MPI), based on the draft User Level Failure Mitigation (ULFM) proposal of the MPI Forum's Fault Tolerance Working Group, is used to create fault-tolerant applications. This allows applications and libraries to design their own recovery methods and control them at the user level. However, only a limited amount of research work on user level failure recovery (including the implementation and performance evaluation of this prototype)...[Show more]

CollectionsANU Research Publications
Date published: 2014
Type: Conference paper
URI: http://hdl.handle.net/1885/57279
Source: Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS
DOI: 10.1109/IPDPSW.2014.132

Download

File Description SizeFormat Image
01_Ali_Application_level_fault_2014.pdf545.39 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  27 November 2018/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator