Fault recovery mechanism for multiprocessor servers

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 22 Nov 2002Publisher:IEEE Comput. SocJournal:Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing

Authors: Yoshio Masubuchi; Satoshi Hoshina; Tomofumi Shimada; Hideaki Hirayama; Nobuhiro Kato;

doi: 10.1109/ftcs.1997.614091

Fault recovery mechanism for multiprocessor servers

- Summary
- Metrics

Abstract

Achieving higher reliability in open server computer systems with low cost has been an increasing interest recently. To satisfy this general demand, we propose a new fault recovery mechanism. We extended the recovery cache scheme to adapt to state-of-the-art multiprocessor server computer systems, and built a system level fault recovery mechanism. It enables the system to recover from most intermittent hardware errors without rebooting the system. Furthermore, faulty processors can be isolated dynamically, and not only hardware errors but also many of operating system panics caused by unanticipated software errors can be recovered. The fault recovery mechanism is implemented with the "add-on" hardware module and controlling software module and fully transparent to application programs. Thus no modification is required to the basic hardware and binary compatibility is maintained which is mandatory for open systems. System performance was evaluated using TPC-C benchmark. We also built an experimental system with prototype hardware.

Related Organizations

Toshiba (Japan)
Japan

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average