How lxcfs Provides Accurate /proc and /sys Isolation for Containers
This article explains how lxcfs uses FUSE and cgroup technology to isolate /proc and /sys files inside containers, showing the implementation details of cpuonline and loadavg handling, and includes key source code snippets that illustrate the isolation mechanism.
What is lxcfs
lxcfs is a user‑space virtual file system built on FUSE and cgroup technology that isolates the /proc and /sys paths inside a container, allowing commands such as top and free to display the container’s real resource usage instead of the host’s values.
Reading lxcfs Filesystem
When lxcfs starts, it mounts a directory (e.g., /var/lib/lxcfs ) as the entry point. Reads from this mount point are routed through the kernel’s FUSE layer, which calls back into lxcfs’s file‑operation functions. The VFS then obtains the container’s cgroup information, reads the corresponding host files, and returns the computed values to the container.
Source Code Overview
<code>static void __attribute__((constructor)) collect_and_mount_subsystems(void) { ... }</code>The program registers its FUSE operations in lxcfs_ops , providing callbacks for getattr , open , read , write , and other file operations needed to serve virtual files.
<code>const struct fuse_operations lxcfs_ops = { .getattr = lxcfs_getattr, .open = lxcfs_open, .read = lxcfs_read, .write = lxcfs_write, ... };</code>cpuonline Implementation
The lxcfs_read function dispatches reads based on the path prefix ( /cgroup , /proc , /sys ). For /sys reads it calls do_sys_read , which eventually invokes sys_devices_system_cpu_online_read to obtain the container’s CPU online information.
<code>int sys_devices_system_cpu_online_read(char *buf, size_t size, off_t offset, struct fuse_file_info *fi) { ... }</code>This function retrieves the container’s init PID, finds its cgroup, checks CPU quota and cpuset, and then formats the isolated cpuonline value (e.g., 0‑3 for a 4‑CPU container).
loadavg Implementation
lxcfs runs a daemon that periodically (every 5 seconds) traverses a hash table of containers and updates their load averages using the same algorithm as the Linux kernel. The daemon reads each container’s task list, counts runnable tasks, and computes the three load‑average values.
<code>void *load_begin(void *arg) { while (1) { /* traverse hash, update load */ } }</code>The refresh_load function reads /proc/<pid>/task/*/status to count tasks in R or D state, then updates the container’s avenrun fields.
<code>static int refresh_load(struct load_node *p, char *path) { ... }</code>When a container reads /proc/loadavg , proc_loadavg_read looks up the container’s node in the hash table and returns the cached values.
<code>static int proc_loadavg_read(char *buf, size_t size, off_t offset, struct fuse_file_info *fi) { ... }</code>Isolation Effect
Tests show that both cpuonline and loadavg are correctly isolated: a container with 2 CPU cores reports 0‑1 for cpuonline and appropriate load‑average numbers based only on its own tasks, while the host retains its full 40‑core view.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.