Deep Dive into Kafka, RocketMQ, and JMQ Storage Architectures
This article compares the storage models, data organization, indexing, read/write processes, and performance trade‑offs of three major message queues—Kafka, RocketMQ, and JMQ—providing detailed technical insights for architects and engineers making storage‑related design decisions.
Kafka Storage Architecture
Topic and Partition – A topic is a logical namespace; each partition is a physical directory on a broker storing log segments. Partitions enable parallelism.
Data organization – Each partition is split into 1 GB segment files named by the starting offset (e.g., 00000000000000000000.log). Two index files accompany each segment: .index (offset → file position) and .timeindex (timestamp → offset).
Write path – Index files are memory‑mapped ( mmap) for low‑overhead updates; segment files are appended sequentially and flushed asynchronously via the OS page cache.
Read path – Indexes are read via mmap; segment data is transferred with sendfile for zero‑copy.
Key techniques – Reliance on PageCache, sequential I/O, and mmap yields >100 k TPS per partition. Log rolling creates a new segment when log.segment.bytes (default 1 GB) is reached.
RocketMQ Storage Architecture
RocketMQ separates data and index layers into three components: CommitLog – Global append‑only log; each file defaults to 1 GB and rolls over when full. ConsumeQueue – Per‑topic/queue index storing fixed‑length entries (20 bytes) of CommitLog offset, size, and tag hash; uses memory‑mapped files and asynchronous flushing. IndexFile – Global hash‑based index for key lookups, with configurable slots (e.g., 5 M) and index entries (e.g., 20 M).
Read/Write flow :
Producer sends a message to a broker.
Broker appends the message to CommitLog.
Background threads build ConsumeQueue and IndexFile.
Consumer reads ConsumeQueue to locate the CommitLog offset, then fetches the message. IndexFile enables key‑based back‑tracking.
Performance tips – Place CommitLog on dedicated SSDs; ConsumeQueue and IndexFile can reside on cheaper disks. Monitor consumer lag before scaling storage.
JMQ Storage Architecture
JMQ combines concepts from Kafka and RocketMQ. Its basic unit is a PartitionGroup (a set of journal files) within a broker; each topic contains multiple partitions, each with its own index files.
Write path – Messages are serialized into a DirectBuffer and asynchronously flushed to journal files via FileChannel. JMQ avoids mmap for writes, reducing page‑fault overhead.
Read path – Dense fixed‑length indexes allow direct calculation of an index’s file position ( pos = indexNumber * indexLength). Over 99 % of reads hit an off‑heap cache, eliminating random disk reads.
Design benefits :
High concurrency through PartitionGroup design.
Low write latency via DirectBuffer and Raft‑based replication.
Efficient dense indexing suited for large micro‑service clusters.
Comparative Highlights
Both Kafka and JMQ use sequential append‑only logs, but JMQ’s PartitionGroup model provides higher write concurrency and more stable synchronous‑write performance under network latency. RocketMQ’s three‑layer design isolates write‑only CommitLog from read‑optimized ConsumeQueue and global IndexFile, enabling fast key‑based queries.
Key takeaways:
Kafka achieves ultra‑high throughput with PageCache, sequential I/O, and mmap.
RocketMQ separates storage and indexing, using memory‑mapped ConsumeQueue and hash‑based IndexFile for low‑latency reads.
JMQ avoids mmap on writes, uses DirectBuffer and off‑heap caching, and offers dense fixed‑length indexes for predictable read latency.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
