Fundamentals 28 min read

Understanding the Linux Kernel Buffer Cache: Structure, Initialization, and Management

This article explains the role of the Linux kernel buffer cache in file I/O, describes its data structures such as buffer_head, details how buffers are organized with hash tables and free lists, and walks through initialization, allocation, release, synchronization, and optimization techniques with code examples.

Deepin Linux
Deepin Linux
Deepin Linux
Understanding the Linux Kernel Buffer Cache: Structure, Initialization, and Management

1. Buffer Cache Overview

1.1 What is a buffer cache

In the Linux kernel, a buffer cache is a region of memory that temporarily stores file data, acting as a bridge between the file system and block devices to improve I/O efficiency.

1.2 Why a buffer cache is needed

Because block device access is much slower than memory access, the kernel allocates a pool of buffer blocks (buffer cache) to reduce the number of direct device reads/writes, thereby boosting overall system performance.

1.3 How the buffer cache works

When reading, the kernel first checks if the required data is already in a buffer (cache hit). If not (cache miss), it reads the block from the device into a buffer. When writing, data is first placed in a buffer and later flushed to the device either when the buffer is full or when explicit sync operations are invoked.

2. Buffer Cache Structure

2.1 Key data structures

The core structure is struct buffer_head , defined in the Linux 0.11 source as:

struct buffer_head {
    char *b_data;            /* pointer to 1024‑byte data block */
    unsigned long b_blocknr;/* block number */
    unsigned short b_dev;    /* device number (0 = free) */
    unsigned char b_uptodate;/* data up‑to‑date flag */
    unsigned char b_dirt;   /* dirty flag */
    unsigned char b_count;  /* users of this buffer */
    unsigned char b_lock;   /* lock flag */
    struct task_struct *b_wait;/* tasks waiting for unlock */
    struct buffer_head *b_prev, *b_next;      /* hash queue links */
    struct buffer_head *b_prev_free, *b_next_free;/* free list links */
};

Each member is explained in the article (data pointer, block number, device number, up‑to‑date flag, dirty flag, reference count, lock, wait queue, hash links, and free‑list links).

2.2 Organization of buffers

Buffers are organized using a double‑linked free list and a hash table. The free list provides quick allocation of unused buffers, while the hash table (indexed by a hash of device number and block number) enables fast lookup of already cached blocks.

3. Buffer Cache Initialization and Management

3.1 Initialization process

During kernel boot, buffer_init sets up the buffer pool. The function determines the high address of the buffer area, divides it into 1 KB blocks, creates a buffer_head for each block, links them into the free list, and clears the hash table.

void buffer_init(long buffer_end)
{
    struct buffer_head *h = start_buffer;
    void *b;
    int i;

    if (buffer_end == 1 << 20)
        b = (void *)(640 * 1024);
    else
        b = (void *)buffer_end;

    while ((b -= BLOCK_SIZE) >= ((void *)(h + 1))) {
        h->b_dev = 0;
        h->b_dirt = 0;
        h->b_count = 0;
        h->b_lock = 0;
        h->b_uptodate = 0;
        h->b_wait = NULL;
        h->b_next = NULL;
        h->b_prev = NULL;
        h->b_data = (char *)b;
        h->b_prev_free = h - 1;
        h->b_next_free = h + 1;
        h++;
        NR_BUFFERS++;
        if (b == (void *)0x100000)
            b = (void *)0xA0000;
    }
    h--;
    free_list = start_buffer;
    free_list->b_prev_free = h;
    h->b_next_free = free_list;

    for (i = 0; i < NR_HASH; i++)
        hash_table[i] = NULL;
}

3.2 Allocation and release of buffers

Buffers are allocated with getblk and released with brelse . getblk first searches the hash table; if the buffer is not present, it picks a free buffer, updates its fields, and returns it. brelse decrements the reference count and, when it reaches zero, puts the buffer back on the free list (writing it out first if it is dirty).

struct buffer_head *getblk(int dev, int blocknr)
{
    struct buffer_head *bh;
    int hash = (blocknr ^ dev) % NR_HASH;
    bh = hash_table[hash];
    while (bh) {
        if (bh->b_dev == dev && bh->b_blocknr == blocknr) {
            if (!bh->b_lock) {
                bh->b_count++;
                return bh;
            } else {
                sleep_on(&bh->b_wait);
            }
        }
        bh = bh->b_next;
    }
    bh = free_list;
    while (bh) {
        if (!bh->b_count && !bh->b_dirt) {
            remove_from_queues(bh);
            bh->b_dev = dev;
            bh->b_blocknr = blocknr;
            bh->b_count = 1;
            insert_into_queues(bh);
            return bh;
        }
        bh = bh->b_next_free;
    }
    // fallback handling omitted
}

void brelse(struct buffer_head *bh)
{
    if (!bh) return;
    bh->b_count--;
    if (!bh->b_count) {
        if (bh->b_dirt)
            ll_rw_block(WRITE, bh);
        remove_from_queues(bh);
        bh->b_next_free = free_list;
        bh->b_prev_free = free_list->b_prev_free;
        free_list->b_prev_free->b_next_free = bh;
        free_list->b_prev_free = bh;
    }
}

3.3 Synchronization mechanisms

Functions such as sync , fsync , and fdatasync ensure that dirty buffers are written to the underlying block device. sync queues all dirty buffers, fsync forces the write for a specific file and waits for completion, while fdatasync flushes only file data, not metadata.

4. Buffer Cache Operation Functions

4.1 Read functions

The kernel provides bread , breada , and bread_page for reading. bread obtains a buffer with getblk , reads from the device if the buffer is not up‑to‑date, and returns the buffer. breada also pre‑reads adjacent blocks, and bread_page reads a whole page (typically four blocks) at once.

struct buffer_head *bread(int dev, int block)
{
    struct buffer_head *bh = getblk(dev, block);
    if (!bh->b_uptodate)
        ll_rw_block(READ, bh);
    wait_on_buffer(bh);
    if (bh->b_uptodate)
        return bh;
    brelse(bh);
    return NULL;
}

4.2 Write functions

Writing uses bwrite , which marks a buffer dirty and, if it is the sole user, immediately issues a write to the device; otherwise the buffer is released and will be written later by the sync machinery.

void bwrite(struct buffer_head *bh)
{
    if (!bh) return;
    if (bh->b_dirt) return;
    bh->b_dirt = 1;
    if (bh->b_count > 1) {
        brelse(bh);
        return;
    }
    ll_rw_block(WRITE, bh);
}

5. Performance Optimization Strategies

5.1 Tuning tips

Increasing the buffer cache size (e.g., adjusting vm.dirty_ratio and vm.dirty_background_ratio ) reduces I/O frequency but consumes more RAM. Optimizing the hash function and free‑list management improves lookup and allocation speed. Adjusting sync policies (delaying writes for low‑priority workloads or forcing immediate sync for databases) balances performance and data safety.

5.2 Common issues and solutions

Buffer contention is mitigated with proper locking (the b_lock field). Data loss after crashes is avoided by using journaling file systems (e.g., ext4) and by regularly invoking sync or fsync to flush dirty buffers.

Memory ManagementkernelLinuxOperating SystemFile SystemBuffer Cache
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.