Kafka High‑Throughput Tricks: Sequential Writes, Zero‑Copy, Partitioning
The article explains how Kafka achieves high throughput by writing messages sequentially to disk, leveraging OS page cache and zero‑copy system calls, using partitioned topics for parallelism, batching and compressing records on both producer and broker sides, and employing asynchronous replication with configurable persistence strategies.
Sequential Writes and Zero‑Copy
Kafka writes messages to disk in a strictly sequential order per partition, avoiding random I/O; even spinning disks can approach memory‑level speeds when writes are sequential.
Each partition corresponds to an ordered log file, and Kafka relies on the operating system’s page cache for buffering instead of managing its own cache.
Zero‑copy is achieved via the sendfile() system call, which transfers data directly between kernel buffers of the file and the network socket, eliminating copies to user space, reducing I/O overhead, CPU usage, and increasing read/write speed.
Partitioning and Parallel Processing
The core data model is Topic → Partition → Replica. Each partition is stored and consumed independently, and partitions can be distributed across multiple brokers.
Within a consumer group, each consumer can read from different partitions in parallel, allowing horizontal scaling. For example, a topic split into 10 partitions can be consumed by 10 consumers simultaneously, theoretically increasing throughput tenfold.
Batch Transmission and Compression
Kafka improves network efficiency by batching messages and compressing them.
On the producer side, the batch.size and linger.ms settings aggregate multiple records into a single batch before sending.
On the broker side, batches are written to the log file in one operation, reducing disk I/O.
Supported compression algorithms include gzip, snappy, lz4, and zstd, which can cut network bandwidth usage by more than 70%.
Asynchronous Replication and Configurable Persistence
Kafka uses Java NIO (non‑blocking I/O) and a selector‑based reactor model to handle thousands of connections with a single thread, minimizing context switches.
Zero‑copy sendfile() is also used for replicating data between brokers, moving data directly from disk to the network interface.
Broker responses are batched, allowing multiple acknowledgments to be sent together, which lowers latency and keeps CPU utilization low even under high concurrency.
Persistence policies can be tuned (e.g., acknowledgment levels, log retention) to balance durability and performance.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
