Linux Page Cache Optimization for Kafka: Concepts, Parameter Tuning, and Performance Evaluation
The article explains Linux page cache fundamentals, shows how to inspect and reclaim cache, and provides detailed tuning of vm.dirty_* and vm.swappiness parameters to smooth Kafka write traffic, reduce I/O spikes, and improve overall performance, illustrated with before‑and‑after benchmarks.
This article describes the background of Linux Page Cache optimization, basic concepts of Page Cache, previous solutions for Kafka I/O bottlenecks, how to adjust Page Cache related parameters, and a performance comparison before and after optimization.
1. Optimization Background
When business volume grows to trillions of records per day, Kafka clusters face huge pressure on disk I/O, which becomes the biggest performance bottleneck. Sudden spikes in inbound or outbound traffic can saturate disk I/O, causing request failures and even broker node avalanches.
The article focuses on the following optimization:
Optimizing Linux Page Cache parameters (the main content of this article).
2. Basic Concepts
What is Page Cache? Page Cache is a memory cache for the file system that stores file data in RAM to reduce disk I/O.
Two reasons make it effective:
Disk access is orders of magnitude slower than memory.
Frequently accessed data is likely to be accessed again.
Read Cache
When a read request arrives, the kernel first checks if the data is already in Page Cache. If it is, the request is served from memory (cache hit). If not, the kernel reads from disk, caches the data, and subsequent reads hit the cache.
Write Cache
Write requests are written to the cache first; the underlying storage is not updated immediately. The kernel marks the pages as dirty, adds them to the dirty list, and periodically flushes them to disk. Flushing is triggered when either of the following conditions is met:
The dirty data has existed longer than dirty_expire_centisecs (default 300 centiseconds = 30 s).
The proportion of dirty pages exceeds dirty_background_ratio (default 10 % of total memory).
3. Page Cache Inspection Tool
The cachestat tool can be used to view cache statistics.
Installation:
mkdir /opt/bigdata/app/cachestat cd /opt/bigdata/app/cachestat git clone --depth 1 https://github.com/brendangregg/perf-toolsRunning the tool and interpreting its output (HITS, MISSES, DIRTIES, RATIO, BUFFERS_MB, CACHED_MB) are shown in the original table.
4. How to Reclaim Page Cache
Execute the script:
echo 1 > /proc/sys/vm/drop_cachesAfter reclamation, buff/cache should be close to zero unless there are ongoing writes.
Images before and after reclamation illustrate the effect.
5. Parameter Tuning
View current Page Cache parameters:
sysctl -a | grep dirtyDefault values (Linux kernel):
vm.dirty_background_bytes = 0 # works with vm.dirty_background_ratio vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 # works with vm.dirty_ratio vm.dirty_ratio = 20 vm.dirty_expire_centisecs = 3000 # 30 s vm.dirty_writeback_centisecs = 500 # 5 sPotential problems when a large amount of data stays cached:
Higher risk of data loss on power failure.
Long IO spikes that degrade write performance.
Optimization Recommendations
vm.dirty_background_ratio : Reduce the ratio to trigger more frequent, smaller flushes, smoothing write spikes. Example setting:
# Temporary change sysctl -w vm.dirty_background_ratio=1 # Permanent change echo "vm.dirty_background_ratio=1" >> /etc/sysctl.conf sysctl -p /etc/sysctl.confAlternative: create a dedicated file in /etc/sysctl.d/ :
touch /etc/sysctl.d/kafka-optimization.conf echo "vm.dirty_background_ratio=1" >> /etc/sysctl.d/kafka-optimization.conf sysctl --systemvm.dirty_ratio : Increase for heavy write workloads, decrease for lighter workloads. If the dirty ratio is exceeded, the kernel stalls all application writes until the data is flushed.
vm.dirty_expire_centisecs : Works together with vm.dirty_background_ratio . It defines a time‑based flush trigger, ensuring data is persisted even if the size threshold is not reached.
vm.dirty_writeback_centisecs : Smaller values increase flush frequency; ensure the interval is long enough for the flush to complete.
vm.swappiness : Set to 0 to disable swap and avoid swapping out cached pages.
6. Performance Comparison
After tuning, the following metrics improved:
Write traffic became smoother, with fewer spikes.
Disk I/O utilization showed a flatter curve.
Network inbound traffic remained unchanged.
Charts illustrating these improvements are included in the original article.
7. Summary
Different hardware configurations may yield varying absolute results, but the trend of parameter changes is consistent:
Increasing vm.dirty_background_ratio or vm.dirty_expire_centisecs leads to larger traffic and I/O spikes.
Decreasing those parameters smooths traffic and I/O.
Setting vm.dirty_ratio too low (<10) creates periodic traffic valleys due to write stalls.
Setting vm.dirty_ratio high (>40) keeps traffic smooth.
An example of a well‑balanced configuration: vm.dirty_background_ratio=1 , vm.dirty_ratio=80 , vm.dirty_expire_centisecs=1000 .
The overall flow of traffic remains unaffected while the system becomes more stable.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.