Big Data 5 min read

Why Kafka Can Achieve Million‑Message‑Per‑Second Throughput: Disk Sequential Write, Zero‑Copy, Page Cache, and Memory‑Mapped Files

The article explains how Kafka attains ultra‑high write throughput by leveraging disk sequential writes, zero‑copy data transfer, operating‑system page cache, and memory‑mapped files, detailing each technique’s impact on latency, CPU usage, and overall performance.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Why Kafka Can Achieve Million‑Message‑Per‑Second Throughput: Disk Sequential Write, Zero‑Copy, Page Cache, and Memory‑Mapped Files

Kafka can handle millions of messages per second, and this performance stems from four core techniques: disk sequential writes, zero‑copy transmission, page‑cache utilization, and memory‑mapped files.

Disk Sequential Write

Kafka writes logs sequentially, avoiding random I/O. Sequential writes reduce the costly seek and rotation phases of mechanical disks, and even on SSDs they outperform random writes due to lower flash‑block management overhead.

Zero‑Copy

Traditional data paths copy data from disk to application memory and then to the network buffer, incurring multiple copies. Kafka’s zero‑copy sends data directly from the OS page cache to the network interface, eliminating these extra copies, lowering CPU and memory‑bandwidth load, and boosting throughput.

Page Cache

When Kafka writes to a file, the data first lands in the operating system’s page cache. The OS asynchronously flushes dirty pages to disk, allowing Kafka to return from write calls quickly and increasing overall throughput while reducing latency.

Memory‑Mapped Files

Kafka maps log files into the process address space using mmap , enabling direct memory access to file contents without explicit read/write syscalls. This leverages the OS virtual‑memory subsystem for efficient large‑scale data handling.

Combined, these mechanisms dramatically cut I/O latency, reduce CPU overhead, and enable Kafka’s ability to sustain extremely high write rates.

Big DataKafkaZero Copyhigh performancepage cachememory-mappedsequential write
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.