Operations 10 min read

Designing Pulsar Disaster Recovery: Multi‑Region Replication, Rack‑Aware Placement, and Dual‑Write Strategies

This article explains how Tencent engineers configure Apache Pulsar for disaster recovery, covering multi‑replica consistency, journal handling, rack‑aware placement, GEO cross‑region replication, dual‑write/dual‑consume setups, and operational lessons for ad and billing scenarios.

Tencent Cloud Middleware

Dec 15, 2022

Designing Pulsar Disaster Recovery: Multi‑Region Replication, Rack‑Aware Placement, and Dual‑Write Strategies

Background

Apache Pulsar is a multi‑tenant, high‑performance messaging platform that provides low latency, read‑write separation, rapid scaling, and flexible fault tolerance. It is widely used in large‑scale services where different business lines require different disaster‑recovery (DR) levels.

Replica quorum and strong consistency

Pulsar guarantees strong consistency through a quorum algorithm. A write is considered successful when the number of acknowledgments (Ack Quorum) satisfies A > W/2, where W is the Write Quorum. The system uses a pipeline production model, sequential and striped writes to reduce disk I/O, and multiple caches to lower network overhead.

Journal and BookKeeper persistence

Each Bookie can be configured to force a flush to a journal (write‑ahead log). The journal records every ledger update before it is persisted to the EntryLog. A dedicated sync thread rolls the journal file based on its size. EntryLog and Index entries are first cached in memory and flushed to disk periodically. If a Bookie crashes before flushing, recovery uses the LastLogMark stored in the journal to resume from the correct position.

Operational tuning of quorum and journal

Operators balance cost and reliability by configuring Write Quorum, Ack Quorum, and journal settings. For high‑volume services (e.g., security), the journal is often disabled, relying solely on replica redundancy for data durability.

Rack‑aware Ensemble Placement Policy

The RackawareEnsemblePlacementPolicy selects Bookies from different racks based on network topology, ensuring that a Write Quorum spans at least two racks. Example topology:

root
├─region-a
│  └─rack-1
│     ├─bk1
│     └─bk2
└─region-b
   └─rack-2
      ├─bk3
      └─bk4

Operational notes:

Zookeeper must be deployed across three availability zones, with at least two zones located in the racks used for Pulsar.

Each network partition must be able to handle a temporary traffic surge of up to twice the normal load.

Cross‑region (GEO) replication

Pulsar’s built‑in GEO replication copies data between clusters in different physical regions. In a two‑cluster setup (Beijing ↔ Shanghai), the workflow is:

Producer writes to a local topic in the source cluster.

A replication cursor is created to track progress.

A replication producer reads from the source topic and writes to the corresponding topic in the remote cluster.

Consumers in the remote region read the replicated messages.

Limitations:

Message ordering is guaranteed only within a single data center; global ordering across regions is not provided.

Cursor snapshots are taken periodically, so their timestamps may have slight deviations.

Dual‑write and dual‑consume pattern

To achieve stricter DR, each region can run two independent Pulsar clusters. Writes are duplicated to both clusters (dual‑write) and each cluster is consumed independently (dual‑consume). If one cluster fails, the other continues serving traffic, but consumers must perform de‑duplication to avoid processing duplicate messages.

Billing scenario deployment

Billing pipelines require exactly‑once processing and end‑to‑end traceability, making cross‑city double writes unacceptable. The architecture adds a stateless proxy layer in front of Pulsar clusters, with L5 load balancers routing traffic. When a Pulsar cluster becomes unavailable, the corresponding L5 node is disabled; unconsumed messages are replayed after the cluster recovers, and auxiliary tools back up data locally.

Future direction and platform‑side optimizations

The goal is to shift all DR responsibilities to Pulsar so that business services remain oblivious to failover logic. Upcoming releases aim to provide:

Multi‑cluster SDK support for both producers and consumers, enabling automatic failover.

Tighter synchronization of replication cursors between clusters.

Version 2.10 already supports multi‑cluster producer configuration; consumer‑side multi‑cluster support is planned for a future release.

Message Queues Apache Pulsar Rack Awareness Operational Practices

Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.