Distributed Snapshots Determining Global States of Distributed Systems
5 minute read ∼ Filed in : A paper noteIntroduction
Background
The global-state-detection algorithm
Processesdo not share clocks or memory, we need to design global-state-detection
algorithm such that each process can form a global system state with their own recored states and states of communication channels.
The global-state-detection
algorithm should has following properties:
- It must run at background concurrently with, but not disturb, this underlying computation.
- The snapshots cannot all be taken at precisely the same instant because of synchronization problems.
- Snapshot must be meaningful.
Why is it important?
stable property
: The predicate y is said to be a stable property
of D if y(S) implies y(S’) for all global states S’ of D reachable from global state S of D.
Serveral distributes system problems => problem of find an global-state-detection
algorithm and process can use it to determine whether system holds a stable property
.
For example,
- In deadlock problem => detect a
stable property
(it can be “The system is deadlocked”), and someglobal-state-detection
(deadlock-detection algorithm) can detect such state, and solve the problem accordingly. - For multiple phase distributed algorithms => detect a
stable property
so that one phase can be terminated and the next phase initiated, andglobal-state-detection
can do that.
Current problems
Many global-state-detection
algorithms (eg,. sloving deadlock problems ) are incorrect and impractical because the relationships among local process states, global system states, and points in a distributed computation
are not well understood.
Contribution
- Define these relationships between
local process states, global system states, and points in a distributed computation
- Propose an algorithm by which a process in a distributed system can determines a global state (restrict attention to the problem of detecting stable properties) of the system during computation.
Model of a distributed system
A distributed system consists of a finite set of processes
and a finite set of channels.
A global state of a distributed system is a set of component process and channel states:
Assumption
- Channels have infinite buffers. Error-free, Deliver msgs in order sent.
- Delay of message in channel is finite.
Model
Process p
: defined by a set of states.
Event e
: 5-tuples, <p, s, s’, M, c> , c and M can be null if c channels doesn’t change.
- p: process where the e happens
- s: state before e happen
- s’: state after e happen
- c: channel state altered by e
- M: message sent from c.
Global state
:
- Initial state: each process is at initial state, and channel is empty sequence.
- State transimit on event e: s’ = Next(s, e)
Examples
The algorithm
Motivation
We cannot record the states of all processes and channels at the same instant, but we can record meaningful global system states.
For example, p send msg to q along c, p => c => q
- p record local state, send msg k, and record c state, k could deplicated recoreded at both p and c.
- p record c state, send msg k , and then record local state, k could dispear because both p and c don’t have k.
Consistent global state requires:
- (Number) n msg sent along c before p’s state is recorded = n’ msg sent along c before c’s state is recorded.
- (Number) m msg received along c before q’s state is recorded = m’ msg received along c before c’s state is recorded.
- (Number) m messages received along a channel cannot exceed the n messages sent along that channel. n >= m
Global state detection algorithm
Marker-Sending Rule for a Process p.
For each channel c, incident on, and directed away from p: p sends one marker
along c after p records its state
and before p sends further messages along c.
Marker-Receiving Rule for a Process q.
On receiving a marker along with a channel C:
if (q has not recorded its state) then
q records its state;
q records the state c as the empty sequence;
else:
q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c.
Termination of the Algorithm
- p record state infinite time
- p send the marker to q
- q record state infinite time
Properties of the recorded global state
Example
-
p record state as A
-
p send marker to q, and q receive at global state S3
-
q receive marker and record state as D. Since q hasn’t recorded its state, q also record state of channel c’s state to be empty sequence.
-
q send marker to p
-
since p has recorded its state, p record state of channel c’s state to be M’
Note this state is not the same as s1, s2, s3. So wy need it?
The paper shows that the global state after all prerecording events and before all postrecording events is the above global state.
The above state is exactly after eo’ and before e1’.
Stability detection