Downloads provided by UsageCounts
handle: 2117/106857
In this paper we describe the design of fault tolerance capabilities for general-purpose offload semantics, based on the OmpSs programming model. Using ParaStation MPI, a production MPI-3.1 implementation, we explore the features that, being standard compliant, an MPI stack must support to provide the necessary fault tolerance guarantees, based on MPI's dynamic process management. Our results, including synthetic benchmarks and applications, reveal low runtime overhead and efficient recovery, demonstrating that the existing MPI standard provided us with sufficient mechanisms to implement an effective and efficient fault-tolerant solution. This research received funding from the European Community’s 7th Framework Programme via the DEEP-ER project under Grant Agreement no. 610476. This work has also been supported by the Spanish Ministry of Science and Innovation (contract TIN2012-34557) and by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266. The authors thank Jorge Bell´on, from BSC, for his technical support with the Nanos++ internals. Peer Reviewed
Supercomputadors, :Enginyeria elèctrica [Àrees temàtiques de la UPC], Àrees temàtiques de la UPC::Enginyeria elèctrica, Processament en paral·lel (Ordinadors), Parallel programming (Computer science), High performance computing, ParaStation MPI, OmpSs programming model
Supercomputadors, :Enginyeria elèctrica [Àrees temàtiques de la UPC], Àrees temàtiques de la UPC::Enginyeria elèctrica, Processament en paral·lel (Ordinadors), Parallel programming (Computer science), High performance computing, ParaStation MPI, OmpSs programming model
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 30 | |
| downloads | 82 |

Views provided by UsageCounts
Downloads provided by UsageCounts