Big Data 4 min read

How Kafka Hits Million‑Message Throughput with Disk Writes, Caching & Zero‑Copy

Kafka attains million‑level message throughput by writing logs sequentially to disk, leveraging OS page cache for batch flushing, sending messages in large batches, and employing zero‑copy techniques that move data directly from disk to network sockets, dramatically reducing I/O overhead and CPU usage.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
How Kafka Hits Million‑Message Throughput with Disk Writes, Caching & Zero‑Copy

Kafka is a critical middleware for large‑scale architectures; its high‑throughput design relies on several key techniques.

Sequential Disk Writes

Kafka uses an append‑only log where messages are written to disk in the order they arrive, avoiding random seeks and taking advantage of the high speed of sequential writes.

Traditional mechanical disks suffer from long seek times during random I/O, while sequential writes keep the disk head moving minimally, reducing latency.

Page Cache

The OS page cache stores data in memory before it is flushed to the physical disk, allowing Kafka to batch writes and minimize small I/O operations.

When a consumer reads data, the kernel first checks the page cache; a cache hit returns the data from memory instantly.

Batch Sending

Instead of sending each message individually, the producer groups many messages into a larger request, reducing network overhead and latency.

batch.size=16384 // 16KB batch size
linger.ms=5 // wait up to 5 ms
buffer.memory=33554432 // 32 MB buffer
compression.type=snappy // batch compression

Zero‑Copy Transfer

Zero‑copy moves data directly from the disk file to the network socket buffer via the kernel, eliminating user‑space copies and lowering CPU usage.

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

By combining sequential writes, page‑cache flushing, batch sending, and zero‑copy, Kafka dramatically reduces I/O overhead and achieves million‑level message throughput.

Kafka是如何实现百万级高吞吐?
Kafka是如何实现百万级高吞吐?
Kafka是如何实现百万级高吞吐?
Kafka是如何实现百万级高吞吐?
Kafka是如何实现百万级高吞吐?
Kafka是如何实现百万级高吞吐?
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaZero CopyHigh Throughputpage cacheDisk I/OBatching
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.