How Kafka Achieves Ultra‑High Throughput: Sequential I/O, Zero‑Copy, and More
This article explains how Kafka’s design—using sequential disk reads/writes, zero‑copy system calls, file segmentation, batch sending, and message compression—delivers massive throughput while minimizing performance loss and network load.
Kafka is a distributed messaging system designed to handle massive volumes of messages. It writes all messages to large‑capacity disks, trading off little performance loss for strong storage capability.
Sequential Read/Write
Kafka appends messages to files, leveraging the sequential read/write performance of disks. Sequential I/O avoids seek time, requiring only minimal sector rotation, making it far faster than random I/O. Official test data (Raid‑5, 7200 rpm) shows sequential I/O at 600 MB/s versus random I/O at 100 KB/s.
Zero‑Copy
In a typical file‑to‑network transfer, data moves from user space to kernel space and then to the network socket, involving multiple copies. Zero‑copy system calls introduced after Linux kernel 2.2 map disk space directly to memory, eliminating the user‑buffer copy and reducing context switches to two, roughly doubling performance.
File Segmentation
Kafka topics are divided into partitions, each further split into segments. Thus messages are stored across many segment files. This segmentation means each file operation deals with a small file, making I/O lightweight and enhancing parallel processing.
Batch Sending
Kafka batches messages in memory before sending them in a single request. Producers can trigger a send when a certain number of messages accumulate (e.g., 100 messages) or after a time interval (e.g., every 5 seconds), greatly reducing server I/O operations.
Data Compression
Kafka supports compressing message batches using GZIP or Snappy. Compression reduces the amount of data transmitted, easing network load. Although consumers must decompress, the CPU overhead is acceptable because network bandwidth, not CPU, is the bottleneck in large‑scale data processing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
