Understanding Message Queues: From Redis List to Kafka and Pulsar Architecture
This article explains the evolution of message‑queue middleware by comparing basic double‑ended queue implementations, Redis list usage, Kafka’s partitioned log design with cursor‑based consumption and high‑availability replication, and Pulsar’s compute‑storage separation using BookKeeper, while highlighting their strengths, limitations, and practical trade‑offs.
The article begins by noting the abundance of message‑middleware solutions such as RabbitMQ, Kafka, RocketMQ, Pulsar, and Redis, and proposes to explore their differences through a technical evolution perspective.
1. The most basic queue – a double‑ended queue can be implemented with a doubly linked list, supporting push_front (add to head) and pop_tail (remove from tail). Producers add messages, consumers remove them.
While this in‑memory structure is simple, scaling it to massive concurrent reads/writes requires careful engineering.
2. Redis as a queue – Redis provides a list data type that directly maps to the abstract queue operations: lpush (push to left) and rpop (pop from right). Redis’s high‑performance implementation makes it a strong candidate, but it suffers from several drawbacks:
Message persistence : Redis is primarily in‑memory; AOF/RDB are auxiliary and can lose data on crash.
Hot‑key performance : A single list can become a hot key that cannot be scaled by adding nodes.
No acknowledgment mechanism : Once rpop removes a message, it cannot be recovered if the consumer fails.
No multi‑subscriber support : Only one consumer can read a message, preventing broadcast scenarios.
No re‑consumption : After a failure, you cannot replay messages from the beginning.
Some of these issues can be mitigated (e.g., using RocksDB‑based KV stores for persistence), but others remain.
Redis 5.0 introduced streams , a data structure designed for messaging, yet the article moves on to discuss dedicated message systems.
3. Kafka – Kafka solves the two core problems of Redis queues: hot‑key bottlenecks and data loss. It introduces partitions (splitting a logical topic into multiple ordered logs) and stores each partition as a series of segment files. New messages are appended to the latest segment, and a cursor tracks consumption without deleting data.
Key advantages include:
Support for ACK semantics via cursor advancement.
Consumer groups with isolated cursors, enabling broadcast (1‑N) consumption.
Efficient random‑access reads using index files; each index entry maps offset → position in the segment.
To keep index size manageable, Kafka uses sparse indexing , recording positions every N messages and performing a binary search to locate the nearest entry.
Retention is handled by deleting whole segment files once all messages in a segment have expired, avoiding costly per‑message deletions.
High Availability – Each partition has a leader and multiple followers . Producers write to the leader; followers replicate the data. Acknowledgment can be configured to wait for the leader only (low latency) or for all in‑sync replicas (high durability).
Below is an example of how consumer‑group offsets are stored (original source code kept intact):
{
"topic-foo": {
"groupA": {
"partition-0": 0,
"partition-1": 123,
"partition-2": 78
},
"groupB": {
"partition-0": 85,
"partition-1": 9991,
"partition-2": 772
}
}
}Kafka’s architecture provides high throughput, low latency, and robust tooling, but scaling can be painful because adding brokers does not automatically rebalance existing partitions.
4. Pulsar – Pulsar adopts a compute‑storage separation model. The stateless broker handles API requests, while persistent storage is delegated to BookKeeper , which stores data in ledgers (analogous to segment files). Each segment is replicated across multiple bookies with configurable replication factor (n, m, t).
Key benefits:
Broker scaling is trivial (stateless services).
Segments are distributed across BookKeeper nodes, enabling seamless storage expansion without data migration.
High availability is achieved through ledger replication; any failed bookie can be bypassed.
Example ledger layout (original source code kept intact):
- /kafka/topic/order_create/partition-0
- 0.log
- 0.index
- 18234.log #segment file
- 18234.index #index file
- 39712.log
- 39712.index
- 54101.log
- 54101.indexPulsar introduces subscriptions (similar to Kafka consumer groups) with four consumption models: exclusive , failover , shared , and key‑shared , allowing both queue‑style and stream‑style consumption patterns.
Overall, the article provides a comprehensive comparison of message‑queue architectures, illustrating how Kafka’s partitioned log design and Pulsar’s storage‑separated model address scalability, durability, and consumption flexibility.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
