Why Is Kafka So Fast? 7 Core Techniques Behind Its High Throughput
This article explains how Kafka achieves million‑message‑per‑second throughput by leveraging zero‑copy I/O, an append‑only log, batch processing, compression, consumer pull optimization, unflushed memory buffers, and JVM garbage‑collection tuning, detailing each mechanism and its impact on performance.
Kafka is a publish‑subscribe messaging system capable of handling millions of messages per second. Its remarkable throughput stems from a combination of low‑level system optimizations and architectural choices, which are explored in seven key areas.
1. Zero‑Copy Technology
Zero‑copy avoids copying data between kernel and user space by transferring data directly within the kernel, reducing CPU usage and I/O latency. Kafka uses zero‑copy (e.g., sendfile()) to move data from disk to the network interface without intermediate copies.
Steps 1.1‑1.3: Producer writes data to disk; Step 2: Consumer reads without zero‑copy; Step 3: Consumer reads with zero‑copy.
Data flow with zero‑copy:
Disk → OS cache
OS cache → socket buffer (via sendfile())
Socket buffer → NIC → Consumer
2. Append‑Only Log Structure
Kafka persists data by appending records sequentially to log files rather than performing random writes. Sequential I/O is orders of magnitude faster than random I/O on both spinning disks and SSDs, allowing higher write throughput.
3. Message Batching
Both producers and brokers batch multiple messages into a single batch before sending or writing to disk, reducing network round‑trips and disk I/O.
Producer Side
The producer’s send() method queues messages in a buffer; an asynchronous thread flushes the buffer as batches.
batch.size – maximum batch size (default 16 KB)
buffer.memory – total memory for the producer buffer (default 32 MB)
linger.ms – maximum wait time before sending a batch (default 0 ms)
compression.type – compression algorithm for the batch (default none)
Broker Side
When a broker receives a batch, it writes the whole batch to disk using memory‑mapped files, avoiding extra copies and system‑call overhead.
Consumer Side
Consumers pull batches from brokers; the client then unpacks the batch and delivers messages one by one to the application.
fetch.min.bytes – minimum bytes per fetch request (default 1 B)
fetch.max.bytes – maximum bytes per fetch request (default 50 MB)
fetch.max.wait.ms – maximum wait time for a fetch (default 500 ms)
max.partition.fetch.bytes – max bytes per partition per fetch (default 1 MB)
4. Batch Compression
Kafka can compress batches (e.g., gzip, snappy) before transmission, saving network bandwidth at the cost of additional CPU for compression and decompression. The system allows producers, brokers, and consumers to negotiate compression settings.
5. Consumer Pull Optimization
Consumers use a pull model, requesting data when ready, which lets them control consumption rate and reduces server load. Typical consumer responsibilities include subscribing to topics, sending heartbeats, fetching data, and committing offsets.
6. Unflushed Buffered Writes
Kafka writes incoming messages to memory‑mapped files and relies on the OS to flush them to disk later, improving write latency. Parameters controlling flush behavior include:
log.flush.interval.messages – messages per forced flush
log.flush.interval.ms – time interval per forced flush
producer.type – sync (wait for flush) or async (return immediately)
7. JVM GC Optimization
Kafka runs on the JVM, so garbage‑collection tuning is critical. Recommended settings are:
Heap size: 4 GB–6 GB (Kafka relies on OS page cache, not heap)
Off‑heap memory: ~8 GB for I/O buffers and compression
GC algorithm: G1GC (targets <200 ms pauses)
Key G1 flags: -XX:MaxGCPauseMillis=200, -XX:InitiatingHeapOccupancyPercent=45, -XX:G1ReservePercent=10,
-XX:G1HeapRegionSize=2MReferences
https://medium.com/swlh/why-kafka-is-so-fast-bde0d987cd03
https://blog.bytebytego.com/p/why-is-kafka-fast
https://blog.csdn.net/csdnnews/article/details/104471147
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
