Why Kafka’s Disk I/O Isn’t Slow: Leveraging OS Page Cache for High Performance
This article explains how Kafka’s design uses the operating system’s page cache and sequential disk writes to achieve fast, reliable persistence, debunking the myth that disk I/O is inherently slow and outlining practical configuration tips.
1 Background
After reading the Kafka official documentation on persistence, I found the material abstract, so I drafted subtitles, categorized the content, rewrote obscure sentences in plain language with examples, and added summaries.
2 Disk I/O speed depends on usage, not perception
Kafka’s message caching and storage heavily rely on the file system. Many assume disk I/O is very slow, which leads to doubts about the competitiveness of persistent structures. In reality, disk speed is determined by usage patterns; a well‑designed disk layout can be as fast as the network.
Sequential writes can be faster than random memory access
Over the past decade, HDD throughput and seek latency differ greatly. In a JBOD setup with six 7200 rpm SATA RAID‑5 drives, linear write performance is about 600 MB/s, while random write is only ~100 KB/s, a >6000× gap. Modern OSes optimize sequential reads/writes with pre‑read and write‑back batching, and some studies even show sequential disk access can outperform random memory access.
Using OS page cache to accelerate disk I/O
Modern OSes aggressively use main memory as page cache. All idle memory is used for disk caching with minimal performance loss when reclaimed. All disk reads/writes go through this unified cache, making it hard to disable. Even if a process keeps its own cache, the data may be duplicated in the OS page cache, effectively storing data twice.
When Kafka writes data to the file system, it synchronously writes to the page cache and asynchronously flushes to disk.
3 Drawbacks of caching data in the JVM
Kafka runs on the JVM, and anyone familiar with Java memory knows two facts:
Object overhead can double or triple the size of stored data.
As heap data grows, garbage collection becomes increasingly costly, consuming resources that could be used for business logic.
4 Advantages of caching data in the file system
Using the file system with page cache is superior to maintaining in‑memory caches for several reasons:
Page cache effectively doubles the usable cache compared to JVM objects, which have high memory overhead.
Storing messages as compressed byte structures saves space and can further double effective cache capacity.
Even on a 32 GB machine, you can achieve 28–30 GB of cache without incurring GC overhead.
The cache remains hot after a broker restart because the OS page cache persists across process restarts.
Code complexity is greatly reduced since consistency between cache and file system is handled by the OS.
If the disk workload favors sequential reads, the OS pre‑read feature fills the cache with useful data ahead of time, which matches Kafka’s sequential consumption pattern.
5 Design recommendations for Kafka’s file‑system storage and caching
Instead of keeping as much data as possible in memory and flushing to the file system only when space runs low, Kafka writes all data immediately to the persistent log (i.e., the OS page cache) and flushes to disk asynchronously.
6 Does this synchronous‑to‑page‑cache, asynchronous‑flush design risk data loss if the OS crashes?
If the broker crashes before the page cache is persisted, the data is not lost; a broker restart recovers it. However, if the operating system or physical machine crashes before flushing, the data in the page cache is lost.
Kafka mitigates this risk with replication: the same data is stored on multiple machines so that a failure of one does not result in data loss.
7 Can we control the flush behavior?
Yes. Kafka’s configuration provides log.flush.interval.messages and log.flush.interval.ms to control how many messages or how much time triggers a flush from the page cache to disk. The flush remains asynchronous.
8 Summary
1. Kafka brokers use the file system as both cache and storage, benefiting from OS optimizations.
2. Kafka writes to the page cache synchronously and flushes to disk asynchronously, which is the main reason for its high write throughput.
3. Kafka’s inherent sequential write/read pattern leverages OS pre‑read, reducing costly disk I/O and accelerating data retrieval.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
