
Reliability assessment is an important step in the development of fault-tolerant computing systems. Availability, MTTF, and, in general, any reliability measure is determined by the system ability to handle faults and errors and the rate of occurrence of these events. A special parameter, the coverage probability, provides information about the effectiveness of the fault tolerance mechanisms embedded into the system. Practically, physical or simulated fault injection experiments are conducted for evaluating the coverage. Unfortunately, the extremely large number of events which can perturb the operation of a computing system makes exhaustive testing intractable. As a consequence, statistical inference has been employed to derive meaningful results after performing a relatively small number of fault injection experiments. This paper presents a new method for inferring the coverage probability by means of optimum 3-stage sampling. A three-dimensional space of events is considered. It is represented by the cross product of system inputs, times of injection, and fault locations. The fault injection consists of a pilot experiment followed by the main injection experiment. The sample size of the main experiment is chosen to minimize the cost of the fault injection for a fixed value of the variance. This approach is used for estimating the coverage probability of a hypothetical fault-tolerant system. Based on our experiments, we conclude that the optimum 3-stage sampling method is especially useful when a low variance of the coverage probability is required.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
