Big Data 9 min read

Why Is Kafka So Fast? Unveiling the Secrets Behind Its High Throughput

This article explains how Kafka achieves remarkable speed and massive throughput by using sequential disk I/O, OS page cache, zero‑copy transfers, partitioned log segments with indexes, batch processing, and efficient compression, making it a cornerstone of modern big‑data pipelines.

Programmer DD

Aug 30, 2021

Why Is Kafka So Fast? Unveiling the Secrets Behind Its High Throughput

Kafka is a ubiquitous messaging middleware in the big data field, widely used for real‑time data pipelines and stream processing.

Although Kafka stores data on disk, it achieves high performance, high throughput and low latency, often handling tens of thousands to millions of messages per second.

1. Sequential Read/Write

Kafka appends messages to the end of log files, using sequential disk I/O, which is orders of magnitude faster than random I/O; this design dramatically improves write throughput.

Each partition is a separate file; data is never deleted, and consumers track their position with offsets stored by the client (often in ZooKeeper).

Kafka provides two retention policies—time‑based and size‑based—to eventually discard old data.

2. Page Cache

Kafka leverages the operating system’s page cache instead of JVM heap memory, avoiding object overhead and garbage‑collection pauses, and benefiting from OS‑level optimizations such as write‑behind, read‑ahead, and flush.

3. Zero‑Copy

Linux’s zero‑copy sendfile moves data directly from the kernel page cache to the network socket, eliminating extra copies between kernel and user space and greatly reducing latency.

The data flow without zero‑copy involves four copies; with zero‑copy Kafka skips the user‑space copy.

4. Partitioning, Segmentation & Indexing

Messages are stored per topic, then split into partitions and further into segments; each segment has an accompanying .index file, enabling efficient reads and parallel processing.

5. Batch I/O

Both reads and writes are performed in batches; producers can enable batch writes to reduce network round‑trips, and consumers read batches of records.

6. Batch Compression

Kafka compresses batches of messages (e.g., using Gzip or Snappy) rather than individual messages, reducing network I/O while preserving the ability to decompress on the consumer side.

Overall, Kafka’s speed stems from sequential disk writes, OS page cache, zero‑copy transfers, partitioned log segments with indexes, and batch processing with compression.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data kafka Message Queue Zero‑copy high-throughput page cache

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.