Cloud Native 16 min read

How Apache Pulsar Enables Seamless Cross‑Cluster Tenant Migration with Replication and Subscription Sync

This article explains how Apache Pulsar’s cross‑region data replication and subscription‑progress synchronization are implemented and optimized, and how they are combined with the Lookup Service to achieve reliable tenant migration across clusters in a cloud‑native environment.

Tencent Cloud Middleware
Tencent Cloud Middleware
Tencent Cloud Middleware
How Apache Pulsar Enables Seamless Cross‑Cluster Tenant Migration with Replication and Subscription Sync

Background

The article is based on a talk from Pulsar Summit Asia 2022 by Tencent Cloud middleware senior engineer Han Mingze, describing the design, implementation, and optimization of cross‑region replication and subscription‑progress sync, and how they solve tenant cross‑cluster migration challenges.

Cross‑Region Replication in Pulsar

Pulsar provides built‑in cross‑region (cross‑datacenter) replication, enabling scenarios such as disaster‑recovery backup and remote read/write. In a typical setup, producers write to the upstream cluster while consumers read from the downstream cluster; replication copies messages asynchronously between clusters without affecting local production or consumption.

Subscription Progress Synchronization

Beyond replicating messages, Pulsar can synchronize subscription consumption progress. This is crucial for disaster‑recovery: when a Beijing cluster fails, a tenant can switch to a Shanghai cluster and continue consuming from the exact position without message loss or duplication.

Consumption progress in Pulsar consists of markDeletePosition (similar to Kafka offset) and individuallyDeletedMessages for single‑message acknowledgments. The Replication module creates internal subscriptions that publish cursor snapshots to other clusters, enabling progress sync.

Challenges in Sync

Only markDeletePosition is synchronized, leaving individuallyDeletedMessages unsynced, which creates acknowledgment gaps and can cause duplicate consumption for delayed or timed messages.

Message IDs differ across clusters (e.g., 1:0 in cluster A vs 3:0 in cluster B), making direct mapping impossible without additional metadata.

Message backlog can block snapshot construction, causing sync failures.

Optimization Techniques

To address these issues, the implementation adds the original cluster’s EntryPosition and originalClusterPosition into the message metadata when sending from cluster A to cluster B. During consumption in cluster B, the metadata is used to map the message back to cluster A’s IDs, allowing individuallyDeletedMessages to be filtered out.

Example request payload:

{
    "snapshot_id":"444D3632-F96C-48D7-83DB-041C32164EC1",
    "source_cluster":"a"
}

Example response payload:

{
    "snapshotid":"444D3632-F96C-48D7-83DB-041C32164EC1",
    "cluster":{
        "cluster":"b",
        "message_id":{
            "ledger_id":1234,
            "entry_id":45678
        }
    }
}

After constructing the cursor snapshot, Pulsar records the correspondence between Message IDs of all involved clusters, enabling accurate progress updates.

Tenant Cross‑Cluster Migration Architecture

The migration relies on the Lookup Service, which maps tenants to physical clusters. By updating the Lookup mapping and unloading the old mapping, clients are redirected to the new cluster. The service also proxies metadata requests (e.g., getPartitionState) while data traffic goes directly to the brokers.

Migration Procedure

Synchronize metadata (tenants, namespaces, topics, subscriptions) to the target cluster.

Enable cross‑region replication and replicate tenant topics.

Before switching clusters, enable subscription‑progress sync to copy both markDeletePosition and individuallyDeletedMessages to the target.

Update the Lookup Service mapping and trigger an unload so clients re‑resolve to the new cluster.

After migration, clean up resources in the original cluster.

Conclusion

The presented solution achieves tenant cross‑cluster migration with minimal code changes, leveraging Pulsar’s native replication and subscription‑progress synchronization, and the Lookup Service for seamless client redirection, providing a reliable, low‑cost migration path for cloud‑native messaging workloads.

cloud-nativeApache PulsarSubscription SyncTenant MigrationCross‑Region Replication
Tencent Cloud Middleware
Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.