How Kafka Leverages Linux Page Cache for High Throughput and Low Latency
This article explains why Kafka achieves remarkable speed by relying on Linux page cache, detailing the differences between page and buffer caches, Kafka's zero‑copy I/O path, relevant kernel parameters, and tuning recommendations for optimal backend performance.
Kafka’s impressive speed and low latency are largely due to its efficient use of the Linux page cache, a topic explored in depth by first reviewing what page cache and buffer cache are and how they appear in the output of the free -m command.
Running free -m shows columns buffers (block cache) and cached (page cache); the page cache stores file pages while the buffer cache stores raw block device data, both aiming to accelerate I/O by writing first to memory and flushing later.
When writing, data is marked dirty in the page cache and later written back to storage (write‑back); when reading, the system checks the cache first and only accesses disk on a miss, with LRU eviction handling memory pressure.
Before Linux 2.4 the two caches were completely separate, causing duplicate caching of the same data; after 2.4 they were merged so a file page in the page cache also satisfies the block cache, simplifying the model used in the rest of the article.
Kafka chooses to rely on the OS page cache instead of managing its own cache for three main reasons: object overhead in the JVM would waste space, JVM‑managed caches would be hampered by garbage collection and large heap sizes, and custom caches would be lost if the process crashes.
The producer writes messages using the pwrite() system call (mapped to Java NIO FileChannel.write()) which writes directly into the page cache, while the consumer reads via sendfile() (Java FileChannel.transferTo()) achieving zero‑copy transfer from page cache to the socket buffer.
Flusher threads and explicit sync() / fsync() calls eventually write dirty pages to disk, guaranteeing durability even after a crash; if a consumer misses a page in cache, the kernel reads from disk and pre‑fetches adjacent blocks into the cache.
The key conclusion is that when producer and consumer rates are balanced, the entire produce‑consume cycle can occur almost entirely within the broker’s page cache, minimizing disk I/O and benefiting from sequential write patterns.
Operational recommendations include setting a modest JVM heap (5‑8 GB) for Kafka so the majority of system memory can be devoted to page cache, and monitoring “lagging consumers” that force cold‑data reads and pollute the cache.
Four kernel parameters can be tuned to control page‑cache flushing behavior: /proc/sys/vm/dirty_writeback_centisecs (flush interval), /proc/sys/vm/dirty_expire_centisecs (max dirty time), /proc/sys/vm/dirty_background_ratio (background flush threshold), and /proc/sys/vm/dirty_ratio (foreground block threshold); adjusting the second and third while avoiding the fourth yields the best performance.
Overall, understanding and configuring Linux page cache is essential for extracting maximum throughput from Kafka deployments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
