Backend Development 28 min read

Understanding Message Queues: From Redis List to Kafka and Pulsar

This article explains the evolution of message‑queue middleware by comparing the basic double‑ended queue implementation, Redis list usage, Kafka’s partitioned log architecture with segment files and sparse indexes, and Pulsar’s compute‑storage separation using BookKeeper, highlighting their designs, strengths, and trade‑offs.

IT Architects Alliance

Jun 11, 2021

Understanding Message Queues: From Redis List to Kafka and Pulsar

The article begins by introducing the abundance of message‑middleware options such as RabbitMQ, Kafka, RocketMQ, Pulsar, and Redis, and sets out to compare them through a technical evolution perspective.

1. The Most Basic Queue – A double‑ended queue can be implemented with a doubly linked list, supporting push_front and pop_tail. Producers add messages, consumers remove them, but scaling this in‑memory structure for high concurrency is non‑trivial.

2. Redis List as a Queue – Redis provides the LPUSH and RPOP commands that map directly to the abstract queue operations. While Redis offers high‑performance in‑memory storage, it suffers from persistence limitations, hot‑key bottlenecks, lack of acknowledgments, single‑consumer semantics, and no support for multiple subscribers or replay.

To mitigate persistence issues, some teams build RocksDB/LevelDB‑based stores that speak the Redis protocol, but other drawbacks remain. Redis 5.0 introduced the STREAM type, yet the article suggests moving to Kafka for a more robust solution.

3. Kafka Architecture – Kafka solves the hot‑key and data‑deletion problems by introducing partitions (logical arrays) that are distributed across multiple brokers. Each partition stores data in sequential segment files, and a cursor tracks consumption without deleting records, enabling ACKs and replay.

Segment files are named by their first offset (e.g., 0.log, 18234.log), allowing binary search to locate a target offset.

Each segment has a corresponding index file mapping offset to file position; Kafka uses a sparse index (e.g., one entry per 10 messages) to balance space and lookup speed.

Retention is handled by deleting whole expired segments, avoiding costly per‑message deletions.

High availability is achieved with a leader‑follower replication model: each partition has one leader handling reads/writes, and multiple followers replicate the log. Producers can acknowledge after the leader write (fast, less reliable) or after all in‑sync replicas (slow, more reliable). 4. Pulsar Architecture – Pulsar separates compute and storage. Stateless brokers handle client requests, while Apache BookKeeper provides durable, replicated storage. Partitions are still used, but each partition’s segments are written to BookKeeper ledgers, which are replicated across a configurable number of bookies. When a broker fails, another broker can take over the write ownership without moving data, because the data resides in BookKeeper. Pulsar introduces subscriptions (exclusive, failover, shared, key‑shared) that abstract consumer groups and support both queue and stream consumption models. Overall, the article concludes that Kafka’s core complexity lies in its storage layer, while Pulsar shifts that complexity to BookKeeper, achieving easier scaling of the compute layer and flexible consumption semantics. Code example of a Kafka consumer‑group offset map: { "topic-foo": { "groupA": {"partition-0": 0, "partition-1": 123, "partition-2": 78}, "groupB": {"partition-0": 85, "partition-1": 9991, "partition-2": 772} } }

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Redis Kafka Message Queue Pulsar storage architecture

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.