Cloud Native 10 min read

Mastering Apache Pulsar Geo‑Replication: Modes, Configs, and Common Pitfalls

Apache Pulsar’s built‑in Geo‑Replication lets multiple clusters across different regions synchronize data, offering both synchronous and asynchronous modes; this guide explains the three asynchronous patterns—full‑mesh, unidirectional, and failover—detailing required configurations, operational principles, and current limitations.

Tencent Cloud Middleware
Tencent Cloud Middleware
Tencent Cloud Middleware
Mastering Apache Pulsar Geo‑Replication: Modes, Configs, and Common Pitfalls

Overview

Apache Pulsar is a multi‑tenant, high‑performance messaging platform that supports low latency, read/write separation, cross‑region replication, rapid scaling, and flexible fault tolerance. Its native Geo‑Replication lets clusters in different physical locations replicate data.

Why Geo‑Replication Matters

With Geo‑Replication, services can be spread across multiple data centers, providing resilience against a whole‑site failure; if one site goes down, traffic can be switched to another site without interruption.

Replication Modes

Based on whether replication is synchronous or asynchronous, two high‑level approaches exist:

Synchronous mode : Guarantees strong durability by writing to replicas in different cities before acknowledging the client, but network jitter can hurt performance.

Asynchronous mode : Writes locally first, then copies to remote sites, preserving producer latency at the cost of extra storage and eventual consistency.

Asynchronous Geo‑Replication Options

The article focuses on asynchronous replication and lists three architectural patterns:

Full‑mesh (all clusters replicate to each other)

Unidirectional replication

Failover mode

These patterns can be further divided by the presence of a global configuration store (configurationStoreServers, i.e., a global ZooKeeper):

With configurationStoreServers – only Full‑mesh is supported.

Without configurationStoreServers – Unidirectional and Failover are available.

Key Configuration Items

When initializing a Pulsar cluster, the following parameters must be supplied:

cluster (cluster name)

zookeeper (local ZooKeeper servers)

configuration-store (global ZooKeeper servers, optional)

web-service-url / web-service-url-tls

broker-service-url / broker-service-url-tls

bin/pulsar initialize-cluster-metadata \
  --cluster pulsar-cluster-1 \
  --zookeeper zk1.us-west.example.com:2181 \
  --configuration-store zk1.us-west.example.com:2181 \
  --web-service-url http://pulsar.us-west.example.com:8080 \
  --web-service-url-tls https://pulsar.us-west.example.com:8443 \
  --broker-service-url pulsar://pulsar.us-west.example.com:6650 \
  --broker-service-url-tls pulsar+ssl://pulsar.us-west.example.com:6651

Full‑Mesh Replication

In a full‑mesh, every cluster can read and write to all others. Data flow is illustrated in the diagram below. To avoid infinite loops, Pulsar tags replicated messages with a replication_from label, allowing brokers to ignore messages that originated from the target cluster.

Full‑mesh diagram
Full‑mesh diagram

Unidirectional Replication

When a global ZooKeeper is not used, unidirectional replication can be configured by pointing configurationStoreServers to the local ZooKeeper address. This allows data to flow only from a source cluster to a downstream cluster, reducing network traffic and storage overhead.

Unidirectional replication diagram
Unidirectional replication diagram

Failover Mode

Failover is a special case of unidirectional replication. The remote cluster acts as a standby replica without producers or consumers. If the active cluster fails, producers and consumers are switched to the standby cluster, and subscription state is also replicated via the replication subscription.

Failover mode diagram
Failover mode diagram

Current Limitations

Only per‑data‑center message ordering is guaranteed; global ordering across sites is not supported.

Cursor snapshots are periodic, so exact timing cannot be guaranteed.

Only the “mark delete position” is synchronized; individual message acknowledgments are not.

All clusters must be online for a cursor snapshot to succeed.

Snapshotting introduces cache overhead that can affect backlog calculations.

References

Further reading on Pulsar storage model, retention policies, and client performance can be found in the linked articles.

cloud-nativeApache PulsarDistributed MessagingCluster ConfigurationGeo-Replication
Tencent Cloud Middleware
Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.