How Tencent Cloud Optimized Apache Pulsar for Million‑Topic Scale
This article details Tencent Cloud's year‑long practical optimizations for Apache Pulsar, covering Ack hole impacts, TTL/Backlog/Retention strategies, delayed‑message handling, Admin API blocking, ZooKeeper and BookKeeper leaks, and multi‑level cache improvements to achieve stable, high‑performance messaging at massive scale.
Background
Apache Pulsar is a cloud‑native messaging system that separates storage and compute, supporting large clusters, multi‑tenant environments, millions of topics, cross‑region replication, and tiered storage. It offers a unified consumption model for both queue and streaming scenarios, providing strong consistency for queues and high throughput with low latency for streams.
Tencent Cloud has deployed Pulsar in production at massive scale for various industry use cases. Over the past year the team performed a series of stability and performance optimizations to ensure reliable operation under diverse workloads.
Why Switch to Pulsar?
Customers previously used Kafka, but Kafka’s architecture limited the number of topics per cluster, forcing multiple clusters and increasing cost. Pulsar’s storage‑compute separation and BookKeeper‑based tiered storage allow easy scaling of topics and offloading data to cheap storage. Pulsar Functions provide lightweight serverless processing for topic‑to‑topic routing. A typical Pulsar deployment at Tencent Cloud now handles around 600,000 topics, reducing cost while meeting business needs.
Practice 1: Ack Hole Impact and Mitigation
When using Shared subscription or single‑message Ack models, users often encounter Ack holes. Pulsar records these in the individuallyDeletedMessages collection, which uses open intervals for holes and closed intervals for acknowledged messages. Early Pulsar versions lacked Ack return values, causing holes when the broker failed to process an Ack.
Mitigation approaches include:
Accurately calculating Backlog Size (complex due to batch messages).
Broker‑side proactive compensation: each ManagedCursor holds its individuallyDeletedMessage set, allowing the broker to push missing acknowledgments to clients.
The broker’s proactive compensation mechanism periodically scans these sets, determines the MarkedDeletedPosition, and pushes updates to clients to avoid frequent Producer Exception triggers.
Practice 2: TTL, Backlog, and Retention Strategies
Key concepts:
TTL : Messages not Acked within a configured time are auto‑Acked by the broker.
Backlog : Gap between produced and consumed messages.
Retention : Duration for which a ledger is kept after messages are Acked.
When both TTL and Retention are set, the message lifecycle follows these rules:
If TTL < Retention, lifecycle = TTL + Retention.
If TTL ≥ Retention, lifecycle = TTL.
Implementation details are illustrated in the following code (simplified for clarity):
public boolean expireMessages(int messageTTLInSeconds) {
if (expirationCheckInProgressUpdater.compareAndSet(this, FALSE, TRUE)) {
log.info("[{}][{}] Starting message expiry check, ttl= {} seconds", topicName, subName, messageTTLInSeconds);
cursor.asyncFindNewestMatching(ManagedCursor.FindPositionConstraint.SearchActiveEntries, entry -> {
try {
long entryTimestamp = Commands.getEntryTimestamp(entry.getDataBuffer());
return MessageImpl.isEntryExpired(messageTTLInSeconds, entryTimestamp);
} catch (Exception e) {
log.error("[{}][{}] Error deserializing message for expiry check", topicName, subName, e);
} finally {
entry.release();
}
return false;
}, this, null);
return true;
}
return false;
}
public static boolean isEntryExpired(int messageTTLInSeconds, long entryTimestamp) {
return messageTTLInSeconds != 0 && (System.currentTimeMillis() > entryTimestamp + TimeUnit.SECONDS.toMillis(messageTTLInSeconds));
}TTL checks only the publish timestamp, ignoring delayed‑message settings, which can cause premature expiration of delayed messages. Tencent contributed a PR that also considers the delay interval when evaluating expiration.
Practice 3: Delayed Messages vs. TTL
In a real incident, a user sent hundreds of thousands of delayed messages with a 10‑day delay, but TTL was set to 5 days, causing all delayed messages to expire after five days. The core TTL logic checks only the publish timestamp, not the delay, leading to loss of delayed messages. The community patch adds delay‑aware checks to prevent this.
Practice 4: Admin API Blocking Optimization
Previous issues:
Frequent synchronous calls inside asynchronous code caused thread blockage, requiring broker restarts.
Http Lookup service could be blocked by such calls.
Improper use of CompletableFuture caused performance degradation.
Metadata Store thread‑pool pressure increased ZooKeeper load.
Solutions included decoupling the Metadata Store thread pool, adding service‑level listeners to locate performance bottlenecks, and introducing a 30‑second timeout that throws an exception instead of hanging indefinitely.
Practice 5: ZooKeeper Node Leak
Although the number of active topics was modest, the number of ZooKeeper nodes grew dramatically (up to 5×). The leak originated from stale topic entries in the six‑level ZooKeeper path hierarchy. Mitigation steps:
List all topic names from the ZooKeeper path.
Use pulsar-admin to verify each topic’s existence; remove entries for non‑existent topics (merged into Pulsar 2.8+).
Backup ZooKeeper data before cleanup.
Practice 6: Bookie Ledger Leak
Retention policies limit message lifetimes (typically ≤30 days), yet some ledgers persisted for hundreds of days and could not be deleted. Analysis revealed that only Retention can trigger ledger deletion; some BookKeeper CLI commands created ledgers outside Retention control. The team used LedgerInfo metadata to identify stale ledgers, verified their association with topics, and safely removed them, taking care to preserve schema information.
Practice 7: Multi‑Level Cache Optimization
Pulsar’s original cache iterated over all segments for each read, and when offset + entrySize > segmentSize it cleared the entire cache, causing periodic performance spikes. The team replaced this with an OHC + LRU strategy to smooth out cache behavior. Relevant code excerpts:
try {
// Check all segments starting from the current one backward
int size = cacheSegments.size();
for (int i = 0; i < size; i++) {
int segmentIdx = (currentSegmentIdx + (size - i)) % size;
// ... read logic ...
}
} catch (Exception e) {
// handle
}
try {
int offset = currentSegmentOffset.getAndAdd(entrySize);
if (offset + entrySize > segmentSize) {
// Rollover to next segment
currentSegmentIdx = (currentSegmentIdx + 1) % cacheSegments.size();
currentSegmentOffset.set(alignedSize);
cacheIndexes.get(currentSegmentIdx).clear();
offset = 0;
}
} catch (Exception e) {
// handle
}After applying OHC + LRU, the cache no longer caused dramatic throughput drops, resulting in stable latency and higher overall throughput.
Summary and Outlook
The article shares Tencent Cloud’s practical experience improving Apache Pulsar stability, covering Ack hole handling, TTL/Backlog/Retention coordination, delayed‑message expiration, Admin API blocking, ZooKeeper and BookKeeper leak remediation, and cache optimization. The team continues to contribute to the community, exploring client‑side retry strategies, broker and bookie OOM mitigation, and enhanced session‑timeout handling.
Tencent Cloud Middleware
Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
