How Kafka Achieves Million‑Level TPS: Sequential Disk I/O, MMAP, Zero‑Copy, and Batch Processing
This article explains how Kafka attains million‑level transactions per second by using sequential disk reads/writes, memory‑mapped files, DMA‑based zero‑copy transfers, and batch data transmission, detailing each technique and its impact on throughput and latency.
How Kafka Achieves Million‑Level TPS?
Kafka is renowned as a high‑throughput message broker for big‑data pipelines, capable of handling millions of transactions per second (TPS) thanks to several architectural optimizations.
Sequential Disk Read/Write
Both producers and consumers perform sequential reads and writes, which are far faster than random access on traditional HDDs or SSDs; Kafka stores each partition as an append‑only log file, always writing new records to the file tail.
Traditional mechanical disks consist of platters, heads, tracks, cylinders, and sectors; understanding these components clarifies why sequential I/O outperforms random I/O.
Memory‑Mapped Files (MMAP)
Kafka maps partition files into virtual memory using MMAP, allowing the operating system to handle paging and enabling near‑memory‑speed I/O while eliminating user‑space to kernel‑space copy overhead.
Zero‑Copy
By leveraging Direct Memory Access (DMA), Kafka reduces data copies: data is transferred directly from disk buffers to the network socket, cutting the typical four‑step transfer (disk→kernel buffer→user memory→socket buffer→NIC) to just two steps (disk→kernel buffer→NIC), dramatically lowering CPU overhead.
The resulting zero‑copy path can reduce data transfer time by up to 65%, a key factor in Kafka’s ability to sustain high TPS.
Batch Data Transfer
Instead of sending one record at a time, Kafka transmits whole log segments (or large batches) to consumers, allowing network compression and achieving throughput such as 10 MB per second for a million messages, effectively delivering 100 k TPS per second.
Summary
Kafka’s million‑TPS capability stems from four core techniques: (1) sequential append writes to partitions, (2) MMAP‑based memory mapping, (3) DMA‑driven zero‑copy transfers, and (4) batch‑oriented network transmission, all of which minimize I/O latency and CPU work.
Reference Reading
Distributed Task Scheduling System Design: Go Implementation
Domain‑Driven Design Framework Axon Practice
Rapid Go Implementation of Paxos Distributed Consensus
Deep Dive into Java Memory Model
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.