Fundamentals 16 min read

Why “Ghost Replay” Threatens Distributed Consistency and How Paxos, Raft, and ZAB Solve It

The article explains the “ghost replay” problem—an inconsistency arising from the third state of distributed requests—and examines how consensus protocols such as Paxos, Multi‑Paxos, Raft, and ZAB address it through epoch IDs, noop entries, and leader election rules to prevent duplicate log entries.

Alibaba Cloud Developer

Mar 12, 2020

Why “Ghost Replay” Threatens Distributed Consistency and How Paxos, Raft, and ZAB Solve It

1. “Ghost Replay” Problem

The “ghost replay” issue belongs to the “third state” problem of distributed systems, where a request can return success, failure, or an unknown timeout. In the unknown case the server’s handling result must be either success or failure, but inconsistencies can arise when logs are replayed after leader changes.

Common consensus protocols such as Paxos, Raft, and ZAB aim to provide high‑availability data consistency. Paxos groups of 3‑5 nodes can tolerate minority failures while maintaining service. Leaders are elected to issue proposals, improving efficiency.

In extreme scenarios like network partitions or node crashes, leader switches and log recovery can cause “ghost replay”. The following table illustrates a case where leader A logs entries 1‑10, loses 6‑10, leader B logs 6‑20, and later leader A attempts to fill the gap, potentially re‑introducing lost entries.

For log index 6, A runs a basic Paxos round, discovers a higher proposal ID, discards its own entry and accepts the majority‑approved one.

For indexes 7‑10, no majority was formed, so A may propose its local logs and achieve majority.

For indexes 11‑19, no valid persisted data exists, so a noop is used to fill the gap.

For index 20, the entry is accepted as it has majority approval.

Scenario 2 is problematic because the missing entries (7‑10) reappear, leading to duplicate operations such as double‑charging in a transfer use‑case.

2. Solving “Ghost Replay” with Multi‑Paxos

Multi‑Paxos can embed an epochID in each log entry, using the current ProposalID as the epoch. When replaying logs in order, if an entry’s epochID is smaller than the previous one, it is identified as a ghost and ignored.

3. Solving “Ghost Replay” with Raft

3.1 Raft Log Recovery

In Raft, a newly elected leader must contain all committed entries (the “drawer principle”). It may also contain uncommitted entries from the previous term, which must be recovered and committed consistently.

Raft adds a constraint: uncommitted entries from a previous term become committed only after they are replicated to a majority in the new term and at least one new log entry is also replicated.

Raft solves this by having the leader append a special Noop entry immediately after election and replicate it to a majority, implicitly committing prior uncommitted entries.

Maximum‑commit principle guarantees no loss of already committed data.

Noop ensures the system does not read uncommitted data; after the Noop is committed, the service resumes normal operation.

3.2 Raft Solution to Ghost Replay

Raft prevents the third‑round scenario because leader election compares the last log term and index; a node with older logs cannot become leader, so missing entries are not re‑introduced.

In a more general ghost scenario, Raft’s rule that a new leader must write a log entry for the current term (a Noop) ensures that any stale entries are overwritten or dropped, avoiding duplicate reads.

4. Solving “Ghost Replay” with ZAB

4.1 ZAB Log Recovery

ZAB separates atomic broadcast and crash‑recovery phases. Leader election chooses the node with the highest zxid (epoch + counter). The epoch part ensures the new leader has the most recent data.

After election, the leader collects each follower’s zxid, sends missing data, and waits for acknowledgments from a majority before entering the broadcast phase.

4.2 ZAB Solution to Ghost Replay

Each election increments the epoch, and the new leader records the current epoch in a local file ( CurrentEpoch). During voting, candidates compare epochs; a node with a smaller epoch is ignored, preventing it from becoming leader and re‑introducing stale logs.

5. Further Discussion

In Alibaba Cloud’s Yuwā consistency system, similar mechanisms to Raft and ZAB are used to ensure that a node capable of causing ghost replay cannot become leader in a new election. The core idea is to guarantee that the committed index is known after failover, using a boundary log (StartWorking, Noop, or CurrentEpoch) to decide whether logs are committed or dropped, thus avoiding the “third state” ambiguity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed-systems Consensus ZAB

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.