How Kafka Hits Million‑Message Throughput with Disk Writes, Caching & Zero‑Copy
Kafka attains million‑level message throughput by writing logs sequentially to disk, leveraging OS page cache for batch flushing, sending messages in large batches, and employing zero‑copy techniques that move data directly from disk to network sockets, dramatically reducing I/O overhead and CPU usage.
Kafka is a critical middleware for large‑scale architectures; its high‑throughput design relies on several key techniques.
Sequential Disk Writes
Kafka uses an append‑only log where messages are written to disk in the order they arrive, avoiding random seeks and taking advantage of the high speed of sequential writes.
Traditional mechanical disks suffer from long seek times during random I/O, while sequential writes keep the disk head moving minimally, reducing latency.
Page Cache
The OS page cache stores data in memory before it is flushed to the physical disk, allowing Kafka to batch writes and minimize small I/O operations.
When a consumer reads data, the kernel first checks the page cache; a cache hit returns the data from memory instantly.
Batch Sending
Instead of sending each message individually, the producer groups many messages into a larger request, reducing network overhead and latency.
batch.size=16384 // 16KB batch size
linger.ms=5 // wait up to 5 ms
buffer.memory=33554432 // 32 MB buffer
compression.type=snappy // batch compressionZero‑Copy Transfer
Zero‑copy moves data directly from the disk file to the network socket buffer via the kernel, eliminating user‑space copies and lowering CPU usage.
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);By combining sequential writes, page‑cache flushing, batch sending, and zero‑copy, Kafka dramatically reduces I/O overhead and achieves million‑level message throughput.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
