Failure detectors

Ajay D. Kshemkalyani; Mukesh Singhal

doi:10.1017/CBO9780511805318.016

15 - Failure detectors

Published online by Cambridge University Press: 05 June 2012

Ajay D. Kshemkalyani and

Mukesh Singhal

Show author details

Ajay D. Kshemkalyani: Affiliation:
University of Illinois, Chicago
Mukesh Singhal: Affiliation:
University of Kentucky

Book contents

Get access

Summary

Introduction

This chapter deals with the design of fault-tolerant distributed systems. It is widely known that the design and verification of fault-tolerent distributed systems is a difficult problem. Consensus and atomic broadcast are two important paradigms in the design of fault-tolerent distributed systems and they find wide applications. Consensus allows a set of processes to reach a common decision or value that depends upon the initial values at the processes, regardless of failures. In atomic broadcast, processes reliably broadcast messages such that they agree on the set of messages delivered and the order of message deliveries.

This chapter focuses on solutions to consensus and atomic broadcast problems in asynchronous distributed systems. In asynchronous distributed systems, there is no bound on the time it takes for a process to execute a computation step or for a message to go from its sender to its receiver. In an asynchronous distributed system, there is no upper bound on the relative processor speeds, execution times, clock drifts, and delay during the transmission of messages although they are finite. This is mainly casued by unpredictable loads on the system that causes asynchrony in the system and one cannot make any timing assumptions of any types. On the other hand, synchronous systems are characterized by strict bounds on the execution times and message transmission delays.

Type: Chapter
Information: Distributed Computing
Principles, Algorithms, and Systems
, pp. 567 - 597

DOI: https://doi.org/10.1017/CBO9780511805318.016 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2008

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

15 - Failure detectors

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive