Practical Byzantine Fault Tolerance
5 minute read ∼ Filed in : A paper noteQuestions
-
How many replicas, and what should be the quorum size of BFT?
-
In Paxos, we need n-f > n/2 => n > 2f => 3 nodes tolerate 1 failure node.
-
But in PBFT, this is not possible. n-2f > f => n>3f. n = 3f+1 at least
n=3f+1 servers, đ of which can be faulty. Unlimited clients
-
Any two quorums must interact at the latest one honest node, quorums size = 2f+1
-
-
Setup:
- Servers and clients can sign messages with a public key and verify them with a private key.
- The whole system run in a view-stepped way, where each view has primary and many backups. Primary accepts a new command from the client, assigns a slot number, and then syncs to backup.
- Matching rule: all replicas have the same ânâ and D(s)
- What if the primary is faulty?
-
Send wrong to other backups,
Replicas communicate with each other to ensure the commons. => need pre-prepare
-
Send different results to the client.
All replica sends to the client, and the client waits for the +1 matching result.
-
Ignore clients
View-change to switch to a new leader. No need to elect, already know who is the next.
-
What if the faulty primary or faulty backup impersonates each other?
Sign with PK to ensure no one can be another node.
-
-
What if the backup is faulty?
-
The client has access control to not faulty.
- Basic protocols,
-
pre-prepare
- prepare certificate,
-
why need 2f+1 =>
2f prepare from other replica and one pre-prepare from primary matching value for a single slot
=> Those form 2f+1
=> every 2f+1 must have at least one honest node interaction; the honest node cannot send different commands.
=> finally, since 2f+1 are matching, thus they must have the same ops as the honest node.
=> And the replica knows another replica has received the same value at the same slot.
-
Can we commit now?
No, since the replica doesnât know if others have received 2f+1 matching prepare+pre-prepare messages.
So we need another round of communication to tell each replica that the majority have accepted the correct value for this slot (2f+1 matching)
-
-
commit certificate, why need 2f+1
Only when the replica has prepared the certificate can it send a commit message
=> once an image receives 2f+1 prepare a certificate; it knows other all 2f+1 have prepared a certificate.
-
The client waits for f+1 matching replies.
No need for 2f+1 because f+1 already has a current result?
PBFT ensures that Once the replies, the operation is committed. So as long as getting one reply from the honest node, then the answer is correct. f+1 replies contain at least one non-faulty since at most f replicas can be faulty. (f crashed nodes are not replying ?)
-
-
View Change
-
Backups monitor the primary. If the primary stops are responding to pings or the backups timeout executing requests, they start a view change.
-
All replicas send to the new primary, and the new primary then sends to all replicas with a new view change.
-
Backup send all prepared certificates. (set P) => with which new primary can check which operation is already committed.
-
primary send it to all replicas, including all gathered prepared certificates, with which replica can verify the primary is not faulty.
There are also pre-prepare messages, which are used for re-executing them at the new leaderâs side
-
-
Garbage collection
servers periodically decide on a check-point (2f+1)
-
Safety and Liveness
It cannot guarantee liveness, the same as Paxos.
It cannot scale well.
AlgoRand/HotStuff => performs better and is being used.
- From PBFT to HotStuff, async => sync.
Introduction
Assumption
-
Run in an asynchronous distributed system
where nodes are connected by a network.
The network may fail to deliver messages, delay them, duplicate them, or deliver them out of order.
-
Independent node failures.
-
It uses cryptographic techniques, Signing a digest of a message and appending it to the plaintext of the message.
-
Fault node can have a strong adversary and is computation bound. It cannot break hash encrtpy,
Properties
-
Safety:
-
Definition: Replicated service satisfies linearizability (behaves like a centralized implementation that executes operations atomically one at a time)
-
Requires:
-
Faulty replica f < n/3
-
Replica runs deterministically, and they must start in the same state.
=> all non-faulty replicas agree on total order for executing requests despite failures.
-
Faulty client: Access control
-
-
-
liveness
It relies on synchrony to provide liveness.
Delay is the time between the moment when a message is sent for the first time and the moment when it is received by its destination.
View change protocol prevents primary from failure.
-
3f + 1 nodes can use tolerant f faulty nodes. n-2f > f => n > 3f
-
Algorithms
Refers to the above
- client request
- timestamp: for exactly-once semantics
pre-prepare + prepare => to ensure the total order of the operation. (ensure each operation has a single slot. )
committed and committed_local
- client request
-
Optimizations