How Kafka Achieves Million‑TPS Through Sequential I/O, MMAP, and Zero‑Copy
Kafka can sustain millions of transactions per second by writing data sequentially to disk, leveraging memory‑mapped files, employing zero‑copy DMA transfers, and batching messages, each technique reducing I/O overhead and CPU involvement, which together enable its high‑throughput performance in big‑data pipelines.
When people think of big‑data transmission, Kafka immediately comes to mind as a "killer app" due to its ability to handle millions of transactions per second (TPS). It has become a favorite in the industry for data collection, transmission, and storage.
How does Kafka achieve million‑TPS?
Kafka relies on several key techniques:
Sequential Disk Read/Write
Both producers and consumers perform sequential reads and writes. Sequential access on modern HDDs or SSDs can be faster than random access, even surpassing memory random reads. Therefore Kafka stores data by appending to the end of a file (a partition), which is inherently sequential.
Traditional mechanical disks consist of multiple platters, each with two surfaces (platters × 2 = number of surfaces). A track is a concentric circle on a surface, grouped into cylinders . Each track is divided into sectors , the smallest addressable unit (typically 512 bytes). Data is read/written in blocks, which are groups of sectors.
Sequential read/write : Accessing records in the logical order they are stored. Random read/write : Access time does not depend on the physical location of the data.
When a block is read sequentially, the disk head can continue to the next block without seeking, whereas random reads require a full seek‑rotate‑transfer cycle.
Memory‑Mapped Files (MMAP)
To further reduce I/O latency, Kafka can map disk files directly into the process address space using MMAP . This allows the application to read/write memory as if it were a file, letting the operating system handle paging and eliminating the need to copy data between user and kernel space.
Zero‑Copy (DMA)
Traditional I/O involves four data copies: disk → kernel buffer (DMA), kernel buffer → user memory (CPU), user memory → socket buffer (CPU), socket buffer → NIC (DMA). Kafka uses zero‑copy to cut this down to two copies: disk → kernel buffer (DMA) and kernel buffer → NIC (DMA), bypassing the CPU for data movement.
This reduction can shrink transfer time by about 65 %, dramatically increasing throughput.
Batch Data Processing
Instead of sending one message at a time, Kafka groups many messages into a single file segment and transmits the whole segment to the consumer. For example, a 10 MB segment containing one million messages can be sent in roughly one second over a good network, achieving 1 M TPS. Consumers track their progress with offsets, and the batch can be compressed to further reduce network I/O.
Summary
The main reasons Kafka can support million‑level TPS are:
(1) Sequential writes to the end of each partition, ensuring optimal disk throughput.
(2) Use of MMAP to map disk files into memory, allowing fast memory‑like access.
(3) Zero‑copy DMA transfers that eliminate unnecessary CPU copies.
(4) Sending data in large batches (often via
sendfile) and compressing them to reduce network overhead.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.