Why Kafka Beats Redis List: A Deep Dive into Message Queue Architecture
This article compares popular message middleware such as Redis, Kafka, and Pulsar, explaining their underlying data structures, strengths and weaknesses, and how concepts like partitions, replication, cursors, and storage segmentation enable high performance, scalability, and reliability in modern distributed messaging systems.
1. The Most Basic Queue
The simplest message queue can be implemented as a double‑ended queue using a doubly linked list, with operations push_front (add to head) and pop_tail (remove from tail). Producers add messages, consumers remove them.
2. Redis Queue
Redis provides a list data type that supports lpush (push left) and rpop (pop right), directly mapping to the abstract queue operations. Redis lists are fast and well‑optimized, but they have drawbacks:
Persistence : AOF and RDB are not fully reliable; data can be lost on crash.
Hot‑key performance : High write/read rates on a single list can create a hot key that cannot be scaled by adding machines.
No acknowledgment : Once rpop removes a message, it cannot be recovered if the consumer fails.
No multi‑subscriber support : Only one consumer can read a message; broadcasting to multiple services is impossible.
No re‑consumption : Deleted messages cannot be replayed.
Redis 5.0 introduced stream , a more advanced structure inspired by Kafka, but it still has limitations.
3. Kafka
Kafka was designed as a dedicated message‑middleware system. It solves two core problems of Redis lists: hot‑key bottlenecks and data deletion. Kafka introduces partitions , splitting a logical topic into multiple partitions that can be distributed across different brokers, thus spreading load.
Kafka stores each partition as an append‑only log divided into segment files. A cursor (offset) tracks consumption without deleting data, enabling ACKs, replay, and multiple consumer groups.
Consumers belong to a consumer group ; each group has its own cursor, allowing independent consumption of the same topic. Only one consumer in a group can read a given partition, ensuring ordered processing.
When a consumer resets its cursor, Kafka uses the segment file name (which is the first offset) and a sparse index to locate the desired message efficiently.
4. Kafka High Availability
Each partition has a leader and multiple followers . Producers write to the leader, which replicates to followers. Acknowledgment strategies trade off latency versus durability: ack after leader write is fast but less reliable; ack after all replicas are in sync is safe but slower.
5. Kafka Advantages and Disadvantages
High performance (up to 1 M TPS), low latency, strong availability, mature tooling and ecosystem.
Drawbacks include limited elastic scaling (single broker can become a bottleneck), costly rebalancing, and performance degradation with many partitions.
6. Pulsar
Pulsar separates compute and storage: stateless brokers handle API requests, while Apache BookKeeper provides durable segment storage with configurable replication. Partitions are split into segments stored across multiple BookKeeper nodes, making storage scaling easy.
Because brokers are stateless, they can be scaled horizontally without moving data. BookKeeper’s ledger abstraction stores each segment with multiple replicas; if a BookKeeper node fails, other replicas serve the data.
Pulsar introduces subscriptions (exclusive, failover, shared, key‑shared) that abstract consumer groups and support both queue and stream consumption models.
7. Storage‑Compute Separation
The evolution from monolithic storage to distributed systems (NAS → HDFS → BookKeeper) reflects the need for scalable, reliable, low‑latency storage. Pulsar’s architecture exemplifies this trend, offering flexible consumption models while delegating durability and replication to a dedicated storage layer.
{
"topic-foo": {
"groupA": {
"partition-0": 0,
"partition-1": 123,
"partition-2": 78
},
"groupB": {
"partition-0": 85,
"partition-1": 9991,
"partition-2": 772
}
}
} - /kafka/topic/order_create/partition-0
- 0.log
- 18234.log #segment file
- 39712.log
- 54101.log - /kafka/topic/order_create/partition-0
- 0.log
- 0.index
- 18234.log #segment file
- 18234.index #index file
- 39712.log
- 39712.index
- 54101.log
- 54101.indexSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
