Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 A model of distributed computations
- 3 Logical time
- 4 Global state and snapshot recording algorithms
- 5 Terminology and basic algorithms
- 6 Message ordering and group communication
- 7 Termination detection
- 8 Reasoning with knowledge
- 9 Distributed mutual exclusion algorithms
- 10 Deadlock detection in distributed systems
- 11 Global predicate detection
- 12 Distributed shared memory
- 13 Checkpointing and rollback recovery
- 14 Consensus and agreement algorithms
- 15 Failure detectors
- 16 Authentication in distributed systems
- 17 Self-stabilization
- 18 Peer-to-peer computing and overlay graphs
- Index
15 - Failure detectors
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 A model of distributed computations
- 3 Logical time
- 4 Global state and snapshot recording algorithms
- 5 Terminology and basic algorithms
- 6 Message ordering and group communication
- 7 Termination detection
- 8 Reasoning with knowledge
- 9 Distributed mutual exclusion algorithms
- 10 Deadlock detection in distributed systems
- 11 Global predicate detection
- 12 Distributed shared memory
- 13 Checkpointing and rollback recovery
- 14 Consensus and agreement algorithms
- 15 Failure detectors
- 16 Authentication in distributed systems
- 17 Self-stabilization
- 18 Peer-to-peer computing and overlay graphs
- Index
Summary
Introduction
This chapter deals with the design of fault-tolerant distributed systems. It is widely known that the design and verification of fault-tolerent distributed systems is a difficult problem. Consensus and atomic broadcast are two important paradigms in the design of fault-tolerent distributed systems and they find wide applications. Consensus allows a set of processes to reach a common decision or value that depends upon the initial values at the processes, regardless of failures. In atomic broadcast, processes reliably broadcast messages such that they agree on the set of messages delivered and the order of message deliveries.
This chapter focuses on solutions to consensus and atomic broadcast problems in asynchronous distributed systems. In asynchronous distributed systems, there is no bound on the time it takes for a process to execute a computation step or for a message to go from its sender to its receiver. In an asynchronous distributed system, there is no upper bound on the relative processor speeds, execution times, clock drifts, and delay during the transmission of messages although they are finite. This is mainly casued by unpredictable loads on the system that causes asynchrony in the system and one cannot make any timing assumptions of any types. On the other hand, synchronous systems are characterized by strict bounds on the execution times and message transmission delays.
- Type
- Chapter
- Information
- Distributed ComputingPrinciples, Algorithms, and Systems, pp. 567 - 597Publisher: Cambridge University PressPrint publication year: 2008