Distributed Snapshots Determining Global States of Distributed Systems

Posted on January 17, 2022   5 minute read ∼ Filed in  : 

Introduction

Background

The global-state-detection algorithm

Processesdo not share clocks or memory, we need to design global-state-detection algorithm such that each process can form a global system state with their own recored states and states of communication channels.

The global-state-detection algorithm should has following properties:

  1. It must run at background concurrently with, but not disturb, this underlying computation.
  2. The snapshots cannot all be taken at precisely the same instant because of synchronization problems.
  3. Snapshot must be meaningful.

Why is it important?

stable property: The predicate y is said to be a stable property of D if y(S) implies y(S’) for all global states S’ of D reachable from global state S of D.

Serveral distributes system problems => problem of find an global-state-detection algorithm and process can use it to determine whether system holds a stable property.

For example,

  1. In deadlock problem => detect astable property (it can be “The system is deadlocked”), and some global-state-detection(deadlock-detection algorithm) can detect such state, and solve the problem accordingly.
  2. For multiple phase distributed algorithms => detect a stable property so that one phase can be terminated and the next phase initiated, and global-state-detection can do that.

Current problems

Many global-state-detection algorithms (eg,. sloving deadlock problems ) are incorrect and impractical because the relationships among local process states, global system states, and points in a distributed computation are not well understood.

Contribution

  1. Define these relationships between local process states, global system states, and points in a distributed computation
  2. Propose an algorithm by which a process in a distributed system can determines a global state (restrict attention to the problem of detecting stable properties) of the system during computation.

Model of a distributed system

A distributed system consists of a finite set of processes and a finite set of channels.

A global state of a distributed system is a set of component process and channel states:

Assumption

  1. Channels have infinite buffers. Error-free, Deliver msgs in order sent.
  2. Delay of message in channel is finite.

Model

Process p: defined by a set of states.

Event e: 5-tuples, <p, s, s’, M, c> , c and M can be null if c channels doesn’t change.

  • p: process where the e happens
  • s: state before e happen
  • s’: state after e happen
  • c: channel state altered by e
  • M: message sent from c.

Global state:

  • Initial state: each process is at initial state, and channel is empty sequence.
  • State transimit on event e: s’ = Next(s, e)

Examples

image-20220128153919005

The algorithm

Motivation

We cannot record the states of all processes and channels at the same instant, but we can record meaningful global system states.

For example, p send msg to q along c, p => c => q

  1. p record local state, send msg k, and record c state, k could deplicated recoreded at both p and c.
  2. p record c state, send msg k , and then record local state, k could dispear because both p and c don’t have k.

Consistent global state requires:

  1. (Number) n msg sent along c before p’s state is recorded = n’ msg sent along c before c’s state is recorded.
  2. (Number) m msg received along c before q’s state is recorded = m’ msg received along c before c’s state is recorded.
  3. (Number) m messages received along a channel cannot exceed the n messages sent along that channel. n >= m

Global state detection algorithm

Marker-Sending Rule for a Process p.

For each channel c, incident on, and directed away from p: p sends one marker along c after p records its state and before p sends further messages along c.

Marker-Receiving Rule for a Process q.

On receiving a marker along with a channel C:

if (q has not recorded its state) then 
	q records its state;
  q records the state c as the empty sequence;
else:
	q records the state of c as the sequence of messages received along c after q’s state 	was recorded and before q received the marker along c.

Termination of the Algorithm

  1. p record state infinite time
  2. p send the marker to q
  3. q record state infinite time

Properties of the recorded global state

Example

image-20220129184614212

  1. p record state as A

  2. p send marker to q, and q receive at global state S3

  3. q receive marker and record state as D. Since q hasn’t recorded its state, q also record state of channel c’s state to be empty sequence.

  4. q send marker to p

  5. since p has recorded its state, p record state of channel c’s state to be M’

    image-20220129185012677

Note this state is not the same as s1, s2, s3. So wy need it?

The paper shows that the global state after all prerecording events and before all postrecording events is the above global state.

image-20220129195930582

The above state is exactly after eo’ and before e1’.

Stability detection

image-20220129194805388





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.