How Pulsar GEO Replication Keeps Subscription State in Sync Across Clusters

This article explains how Apache Pulsar's GEO Replication synchronizes subscription state between primary and backup clusters, detailing the challenges of ledger mismatches, the two‑step mapping process for MessageIds, and the mechanisms that ensure seamless failover without duplicate consumption.

Tencent Cloud Middleware
Tencent Cloud Middleware
Tencent Cloud Middleware
How Pulsar GEO Replication Keeps Subscription State in Sync Across Clusters

Introduction

Apache Pulsar is a multi‑tenant, high‑performance messaging solution that supports GEO Replication, allowing data and subscription state to be replicated across clusters. The article focuses on how subscription state (MarkDeletePosition, MDP) is synchronized in GEO scenarios.

GEO Replication Overview

GEO Replication enables data copy between multiple clusters. In the illustrated setup, three clusters (A, B, C) have different replication relationships: A↔B (bidirectional), A→C (unidirectional), and B↔C (no replication). While data replication is common, synchronizing subscription state is a separate challenge.

Subscription State Sync Scenario

In disaster‑recovery setups, the primary cluster handles writes and consumption; when it fails, the backup takes over. Writing can continue seamlessly, but consumption may duplicate messages if subscription state is not synchronized. The key piece of state is the MarkDeletePosition (MDP) of each subscription.

Challenges with Ledger Mapping

Each topic partition’s ledger IDs differ between clusters (e.g., Ledger‑x in the primary vs. Ledger‑y in the backup). Because the ledgers are not one‑to‑one, MDP cannot be directly mapped; a translation mechanism is required.

Two‑Step Sync Process

Step 1: Map MessageIds (offsets) between clusters. When a message is replicated, its MessageId in the primary must be associated with the corresponding MessageId in the backup.

Step 2: When the primary’s MDP moves, use the MessageId mapping to translate the new MDP into the backup’s context and update the backup subscription.

MessageId Mapping Implementation

Pulsar avoids storing a massive per‑message mapping table. Instead, a periodic task snapshots the Last Acknowledged Cursor (LAC) of each topic in the primary and replicates it to the backup.

Workflow:

Cluster A periodically creates a SnapshotRequest and writes it to a local topic.

The request is replicated to Cluster B.

Cluster B reads the request, packages its local LAC (LAC‑B) into a SnapshotResponse, and writes it back to its local topic, which is then replicated to Cluster A.

Cluster A records the mapping between its local MessageId (the ID of the SnapshotResponse) and LAC‑B. Because the round‑trip order is guaranteed, any MDP greater than this local MessageId implies that the corresponding LAC‑B data has already been consumed.

Synchronizing Subscription Updates

When the primary’s subscription acknowledges messages and the MDP advances, it checks whether the new MDP exceeds the stored local MessageId. If so, it looks up the corresponding LAC‑B, constructs a ReplicatedSubscriptionsUpdate message containing the updated MDP, and writes it to the local topic. This message is replicated to the backup.

The backup receives the update, extracts the LAC and subscription info, and applies the new MDP, completing one synchronization cycle.

Summary and Future Considerations

The current implementation relies on bidirectional GEO Replication; unidirectional setups cannot synchronize subscription state. Moreover, only MDP is synchronized, while other important metadata such as IndividuallyDeletedMessages (used by Shared and Key‑Shared subscriptions) is ignored, potentially causing duplicate consumption.

To handle this, two enhancements are suggested:

Record a mapping of every MessageId between primary and backup (e.g., embed the original MessageId in message properties).

Replicate IndividuallyDeletedMessages to the backup so it can determine whether a message has already been acked.

With these extensions, a backup cluster can accurately determine which messages have been processed and avoid re‑sending them to consumers during failover.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Apache PulsarDistributed MessagingGEO ReplicationMessageId MappingSubscription Sync
Tencent Cloud Middleware
Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.