How TDMQ Pulsar Scales Million-Message Delayed Queues with Multi-Level Time Wheels
The article analyzes why large‑scale delayed messaging is needed, identifies the bottlenecks of the Apache Pulsar community solution, and explains TDMQ Pulsar's three‑step redesign—hierarchical time wheels, expiration re‑push, and immutable message IDs—that together enable stable million‑message delayed queues with controlled memory and minute‑level hole impact.
In distributed systems many business actions are not executed immediately but need to occur at a specific future time, making delayed messaging a fundamental requirement. Typical scenarios involve a predictable time gap between message production and consumption, allowing the queue to manage the wait and simplify business logic.
Why delayed messages are needed
When a producer specifies an expiration timestamp, the consumer automatically receives the message after that time, achieving clean decoupling.
Community solution bottlenecks
As the volume of delayed messages grows from tens of thousands to millions, the Apache Pulsar community approach shows two critical limits.
2.1 Consumption progress persistence – message holes
The community uses MDP (MarkDeletePosition) to mark the highest fully consumed offset and IDM (IndividualDeletedMessages) to record non‑contiguous confirmed messages, i.e., the “holes”. When many delayed messages have uneven expiration times, a large number of not‑yet‑expired messages become stuck in the consumption progress, creating extensive holes.
For example, messages 1:3, 1:7, and 1:9 are still pending; IDM must record intervals such as [(1:4,1:6], (1:7,1:8]]. If IDM data grows too large, it cannot be fully persisted to BookKeeper, leading to lost consumption progress and massive duplicate consumption after a topic reload.
2.2 Delayed index loading – memory and rebuild time
Apache Pulsar loads the entire delayed‑message index into memory by default. When the message scale rises, both memory usage and index reconstruction time become severe bottlenecks. Although newer versions introduce bucketed persistent storage, managing buckets and their metadata (which relies on ZooKeeper cursor properties) adds further complexity.
These two bottlenecks together cause repeated consumption, high memory consumption, and long index‑loading latency in large‑scale delayed‑message scenarios.
TDMQ Pulsar solution: three key designs
The new design replaces the full‑memory index with an external multi‑level time‑wheel index, records only the nearest‑to‑expire delayed messages in the topic, and keeps the message ID unchanged throughout the pipeline.
3.1 Hierarchical time wheel – on‑demand index loading
The core idea is to split delayed indexes of different time spans into separate index topics. The system loads finer‑grained indexes only when they are close to delivery, preventing the entire index from residing in memory. Consequently, memory usage stays bounded regardless of whether there are 100 k or 1 M delayed messages, and completed index topics are automatically deleted.
3.2 Expiration re‑push – compress hole impact to minute‑level
When a delayed message expires, the system reads the original delayed message from the business topic and rewrites it. Consumers simply skip any not‑yet‑expired delayed messages, and process the newly written ones normally. This eliminates large‑scale holes, reduces the IDM data size dramatically, and limits the hole impact to a minute‑level window around the delivery time.
3.3 Immutable message ID – zero client migration
The redesign is transparent to clients: delayed messages are sent to the business topic as before, and the end‑to‑end message ID remains unchanged across production, storage, and consumption. Operational tools such as message tracing and querying work without any additional adaptation.
Solution comparison
(Image comparing the community approach with the TDMQ Pulsar redesign, highlighting memory usage, hole size, and rebuild latency.)
Conclusion
Delayed messaging is a basic yet high‑frequency capability of message queues. While small‑scale implementations are straightforward, scaling to millions of delayed messages exposes inherent constraints in the community solution’s consumption‑progress management and index‑loading mechanisms. TDMQ Pulsar addresses these constraints through a systematic redesign of three pillars—hierarchical time wheels, expiration re‑push, and immutable message IDs—thereby making memory consumption controllable, eliminating large‑scale duplicate consumption, and keeping migration effort near zero.
Hierarchical time wheel : on‑demand index loading keeps memory usage predictable.
Expiration re‑push : compresses hole impact from full‑scale to minute‑level.
Immutable message ID : enables zero‑effort client migration.
Tencent Cloud Middleware
Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
