Published online by Cambridge University Press: 14 July 2016
Suppose that there is a sequence of programs or jobs that are scheduled to be executed one after another on a computer. A program may terminate its execution because of the failure of the computer, which will obliterate all work the computer has accomplished, and the program has to be run all over again. Hence, it is common to save the work just completed after the computer has been working for a certain amount of time, say y units. It is assumed that it takes a certain time to perform a save. During the saving process, the computer is still subject to random failure. No matter when the computer failure occurs, it is assumed that the computer will be repaired completely and the repair time will be negligible. If saving is successful, then the computer will continue working from the end of the last saved work; if the computer fails during the saving process, then only unsaved work needs to be repeated. This paper discusses the optimal work size y under which the long-run average amount of work saved is maximized. In particular, the case of an exponential failure time distribution is studied in detail. The properties of the optimal age-replacement policy are also derived when the work size y is fixed.