Fundamentals 11 min read

Understanding Linux Writeback: How Data Moves from Memory to Disk

This article explains the Linux kernel writeback path, detailing how data travels from user space through page cache, kernel buffers, and the disk controller, and shows how to tune dirty page thresholds and writeback threads for optimal performance.

360 Zhihui Cloud Developer

Oct 12, 2017

Understanding Linux Writeback: How Data Moves from Memory to Disk

data to disk: a multi‑stage writeback process

The client issues a write, which is received by the database, passed to the kernel via a system call, transferred to the disk controller, and finally written to the physical media.

The client sends a write command to the database (data is in client’s memory). The database receives the write (data is in server’s memory). The database calls the system call that writes the data on disk (data is in the kernel’s buffer). The operating system transfers the write buffer to the disk controller (data is in the disk cache). The disk controller actually writes the data into a physical medium (magnetic disk, NAND chip, …).

At step 3, if the process crashes but the OS stays alive, the data can still be flushed to disk. After step 4, with an fsync, the data is guaranteed to reach the disk controller, and the controller normally ensures persistence on the physical medium.

When does the page cache flush to disk, and how can we control the policy?

In kernel 2.6.32 the main writeback daemon is sync_supers , replacing the older pdflush process.

kernel bdi module

Running cat /proc/meminfo | grep Dirty shows the number of dirty pages.

backing_dev_info: each block device has a backing_dev_info structure, usually attached to the device’s request queue.

bdi_writeback: encapsulates the writeback thread; it extracts work items and executes them.

bdi_writeback_work: represents a single writeback task, which can use different flushing strategies. In 2.6.32 it consists of wb_writeback_args and bdi_work.

struct backing_dev_info { struct list_head bdi_list; struct rcu_head rcu_head; unsigned long ra_pages; unsigned long state; unsigned int capabilities; congested_fn *congested_fn; void *congested_data; void (*unplug_io_fn)(struct backing_dev_info *, struct page *); void *unplug_io_data; char *name; struct percpu_counter bdi_stat[NR_BDI_STAT_ITEMS]; struct prop_local_percpu completions; int dirty_exceeded; unsigned int min_ratio; unsigned int max_ratio, max_prop_frac; struct bdi_writeback wb; spinlock_t wb_lock; struct list_head wb_list; unsigned long wb_mask; unsigned int wb_cnt; struct list_head work_list; struct device *dev; #ifdef CONFIG_DEBUG_FS struct dentry *debug_dir; struct dentry *debug_stats; #endif };

The bdi_forker_task is started for each backing device; it checks whether a flush thread needs to be launched. The default thread order is set by bdi_register, which creates a bdi‑default thread that runs bdi_forker_task, eventually invoking wb_do_writeback and writeback_inodes_wb.

Another default thread, sync_super, periodically flushes the superblock.

Writeback behavior can be tuned via parameters under /sys/vm/:

<code>int dirty_background_ratio = 10; // start background writeback when dirty pages exceed this % unsigned long dirty_background_bytes; // if non‑zero, overrides the ratio int vm_dirty_ratio = 20; // block writers when dirty pages exceed this % unsigned int dirty_writeback_interval = 5 * 100; // interval for kupdate‑style writebacks (centiseconds) unsigned int dirty_expire_interval = 30 * 100; // max time dirty data may stay in memory (centiseconds) </code>

dirty_background_ratio triggers background flush when dirty pages exceed the percentage; dirty_ratio blocks write‑issuing processes when the threshold is passed.

Performance tests with dd show that a small dirty_ratio quickly blocks writers and reduces throughput, while a large value allows near‑disk write speed.

<code>sudo sh -c 'echo 0 >/proc/sys/vm/dirty_ratio' cat /proc/sys/vm/dirty_ratio dd if=/dev/zero of=file-abc bs=1M count=30000 sudo sh -c 'echo 100 >/proc/sys/vm/dirty_ratio' sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches' dd if=/dev/zero of=file-abc bs=1M count=30000 </code>

Since kernel 3.10, the bdi‑default thread and pdflush have been replaced by a workqueue named writeback that runs on kworker threads:

bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND | WQ_SYSFS, 0);

Summary recommendations

vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

vm.dirty_background_ratio = 50
vm.dirty_ratio = 80

vm.dirty_background_ratio = 5
vm.dirty_ratio = 80

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data persistence writeback VM Parameters

Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.