Linux Kernel File Cache Mechanism: In-Depth Analysis of Read/Write Processes and Pre-reading Algorithm
The article thoroughly explains Linux kernel file cache, detailing how read() and write() operations traverse VFS to the block layer, the adaptive read‑ahead algorithm that pre‑loads sequential pages, the internal functions such as generic_file_buffered_read and ondemand_readahead, and outlines write handling and future topics on mmap and cache reclamation.
This article provides a comprehensive technical analysis of the Linux File Cache mechanism, a complex subsystem within Linux memory management.
1. File Cache Overview
Linux File Cache manages memory allocated by the kernel for storing file data. When users read files, the OS first allocates memory, reads data from storage into memory, then delivers it to users. When writing, it allocates memory to receive user data first, then writes to disk. The cache includes: regular file data pages, directory information pages, data pages read directly from block device files, swapcache, and special filesystem pages (e.g., shm filesystem). Most cache is file cache, containing active and inactive portions (Active(file) and Inactive(file)).
2. File Cache Mechanism Framework
At the system level, when users initiate read/write requests, the flow goes through: system call read() → VFS (Virtual File System) → Buffer page Cache → Specific File System (ext3/ext4) → Generic Block Layer → IO Scheduler Layer → Block Device Driver → Disk. The cache mechanism is positioned between specific file systems and VFS.
Internally, File Cache is divided into two parts: cache generation and cache reclamation. Cache generation includes read-ahead mechanisms for file reading (via read and mmap) and write file processes.
3. Read File Process via read() System Call
The read() function goes through six stages in the kernel, ultimately calling block device drivers to execute data transfer. The key functions involved include page_cache_sync_readahead() and page_cache_async_readahead() for synchronous and asynchronous pre-reading.
4. Pre-reading Mechanism
Pre-reading (read ahead)批量 reads consecutive file pages from disk into memory before they are actually accessed. The kernel uses a "window" concept with two windows: current window (already read pages) and forward window (pages to be pre-read). Linux kernel pre-reading rules: only sequential reads trigger pre-reading, random reads do not; sequential reads progressively increase pre-read size until reaching the maximum; if all file pages are already cached, pre-reading stops. The kernel abstracts this via "struct file_ra_state" defined in include/linux/fs.h.
5. generic_file_buffered_read Analysis
This function handles buffered reading, involving: finding pages in cache tree via find_get_page(), performing synchronous/async pre-reading via page_cache_sync_readahead() and page_cache_async_readahead(), waiting for page data via wait_on_page_locked_killable(), and copying data to users.
6. ondemand_readahead Function
This is the core function for adaptive pre-reading, with two scenarios: reading from file head and non-file-head reading. For sequential reads, it doubles the pre-read size each time (via get_next_ra_size()), with most cases using twice the previous pre-read page count. The function uses get_init_ra_size() for initial reads and determines pre-read size based on user request size and maximum allowed pages.
7. Write File Process
Unlike reading, file writing has no pre-reading pattern. The flow goes through: sys_write → VFS → ext4_write_begin (allocates pages if not in memory, adds to cache via add_to_page_cache_lru()) → ext4_write_end (initiates IO). The article notes that Direct_IO mode is not discussed.
8. mmap and Cache Recycling
The article mentions that mmap read process involving page faults and virtual memory management will be covered in the middle article, while cache recycling involving LRU mechanism, workingset mechanism, shrink mechanism, and dirty page management will be covered in the final article.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.