Fundamentals 8 min read

How Linux Kernel Performs File Readahead: An Illustrated Walkthrough

This article explains the design and implementation of file readahead in the Linux 3.12 kernel, using sequential, random, and multithreaded read scenarios to show how the kernel initializes readahead windows, triggers synchronous and asynchronous prefetches, and updates the page cache.

Linux Code Review Hub

May 17, 2024

How Linux Kernel Performs File Readahead: An Illustrated Walkthrough

Overview

The article describes the design and implementation of file system readahead in the Linux 3.12 kernel. Readahead means the file system reads more data than requested and caches it in the page cache, so subsequent reads can be served faster without the application noticing.

Scenario 1: Sequential Read

// Example code
{
    ...
    f = open("file", ...);
    ret = read(f, buf, 4096);
    ret = read(f, buf, 2 * 4096);
    ret = read(f, buf, 4 * 4096);
    ...
}

In this simple case the file is opened and three sequential reads are performed. The kernel first looks for the requested page in the page cache. Because the first read misses, a synchronous readahead is triggered.

static void do_generic_file_read(struct file *filp, loff_t *ppos,
                                 read_descriptor_t *desc, read_actor_t actor)
{
    ...
    for (;;) {
        ...
        cond_resched();
    find_page:
        // If not found, start synchronous readahead
        page = find_get_page(mapping, index);
        if (!page) {
            page_cache_sync_readahead(mapping, ra, filp,
                                      index,
                                      last_index - index);
        }
    }
}

The synchronous readahead logic eventually calls:

static unsigned long ondemand_readahead(
    struct address_space *mapping,
    struct file_ra_state *ra,
    struct file *filp,
    bool hit_readahead_marker,
    pgoff_t offset,
    unsigned long req_size)
{
    unsigned long max = max_sane_readahead(ra->ra_pages);
    // First read: initialize the readahead window
    if (!offset)
        goto initial_readahead;
    ...
initial_readahead:
    ra->start = offset;
    ra->size = get_init_ra_size(req_size, max);
    // ra->size is guaranteed >= req_size
    ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
readit:
    /* Will this read hit the readahead marker made by itself?
       If so, trigger the readahead marker now, and merge the
       resulted next readahead window into the current one. */
    if (offset == ra->start && ra->size == ra->async_size) {
        ra->async_size = get_next_ra_size(ra, max);
        ra->size += ra->async_size;
    }
    return ra_submit(ra, mapping, filp);
}

(ra->start, ra->size, ra->async_size)

For the example the initial readahead window becomes (0, 4, 3). After submitting the first read, the kernel has prefetched pages 0‑3, marking page 1 as PAGE_READAHEAD . When page 1 is accessed later, an asynchronous readahead will be triggered.

During the second read (offset = 4096, size = 8192) the kernel translates the request to page units (offset = 1, size = 2) and reads pages 1 and 2, which were already in memory thanks to the first readahead. Because page 1 carries the PAGE_READAHEAD flag, an asynchronous readahead is started.

find_page:
    ...
    page = find_get_page(mapping, index);
    if (!page) {
        page_cache_sync_readahead(mapping, ra, filp,
                                  index,
                                  last_index - index);
        page = find_get_page(mapping, index);
        if (unlikely(page == NULL))
            goto no_cached_page;
    }
    if (PageReadahead(page)) {
        page_cache_async_readahead(mapping, ra, filp,
                                  page, index,
                                  last_index - index);
    }

The asynchronous readahead updates the window to (4, 8, 8) and marks page 4 for future prefetch.

After the two sequential reads, the page cache contains pages 0‑3 (prefetched) and pages 1‑2 (already read). The third read requests pages 3‑6; because page 4 is still marked PAGE_READAHEAD, another asynchronous readahead is triggered, expanding the window to (12, 16, 16).

In summary, three sequential reads combined with the kernel's readahead logic result in a page‑cache state where most of the accessed pages are already present, and asynchronous prefetches keep the cache ahead of the application's sequential access pattern.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kernel Linux file system page cache readahead

Written by

Linux Code Review Hub

A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.