
AbstractCheckpointingan application is the act of saving the application's state during its execution on stable storage, so that if the application fails it can berestartedfrom the last saved state, thereby avoiding loss of the work that was already done. Aheterogeneous checkpoint/restartmechanism allows one to restart an application on a possibly different hardware architecture and/or operating system than those in which the application was saved. This paper explores how to construct such a mechanism at the virtual machine level. That is, rather than dumping the entire state of the application process, the mechanism reported here dumps the state of the application as maintained by a virtual machine. During restart, the saved state is loaded into a new copy of the virtual machine, which continues running from there. The heterogeneous checkpoint/restart mechanism reported here was developed for the OCaml variant of ML. The paper reports on the main issues encountered in building such a mechanism and the design choices made, presents performance evaluations, and discusses some lessons and ideas for extending the work to native code OCaml and Java. Copyright © 2002 John Wiley & Sons, Ltd.
Computing methodologies and applications, heterogeneous checkpoint and restart, virtual machine, General topics in the theory of software, fault tolerance, Distributed systems
Computing methodologies and applications, heterogeneous checkpoint and restart, virtual machine, General topics in the theory of software, fault tolerance, Distributed systems
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 14 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
