Application level fault recovery: Using fault-tolerant open MPI in a PDE solver
A fault-tolerant version of Open Message Passing Interface (Open MPI), based on the draft User Level Failure Mitigation (ULFM) proposal of the MPI Forum's Fault Tolerance Working Group, is used to create fault-tolerant applications. This allows applications and libraries to design their own recovery methods and control them at the user level. However, only a limited amount of research work on user level failure recovery (including the implementation and performance evaluation of this prototype)...[Show more]
|Collections||ANU Research Publications|
|Source:||Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS|
|01_Ali_Application_level_fault_2014.pdf||545.39 kB||Adobe PDF||Request a copy|
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.