Cloud Native 10 min read

Implementing Container Resource View Isolation with Lxcfs and Kubernetes Admission Webhook

This article explains why container resource view isolation is needed, outlines common scenarios where lack of isolation causes issues, and demonstrates how to achieve isolation using Lxcfs together with a Kubernetes mutating admission webhook, including configuration details and sample scripts.

360 Tech Engineering

Feb 7, 2020

Implementing Container Resource View Isolation with Lxcfs and Kubernetes Admission Webhook

Many workloads are accustomed to using commands like top and free on physical or virtual machines to monitor system resources, but when run inside containers these commands still report host-level data, leading to inaccurate resource visibility.

Why isolate container resource views? Containers accelerate packaging and startup but provide weaker isolation than VMs, especially regarding the /proc and /sys filesystems. Processes inside a container see host metrics, which can cause mis‑allocation of memory for JVMs, incorrect CPU thread counts, and other security or performance problems.

Typical scenarios affected:

In production, operators expect top / free to reflect container limits, but they still show host values.

JVMs read host memory limits and allocate heap sizes larger than the container quota, causing startup failures.

Applications like Nginx read /proc/cpuinfo and assume the host CPU count, leading to sub‑optimal thread pools.

Solution overview : Use Lxcfs to virtualize /proc (and eventually /sys/devices/system/cpu/online) and combine it with a Kubernetes mutating admission webhook that mounts the Lxcfs filesystem into each pod’s /proc and /sys paths.

Lxcfs introduction

LXCFS is a small FUSE filesystem designed to make Linux containers feel more like a virtual machine. It currently virtualizes only the procfs files; support for /sys/devices/system/cpu/online is being merged into the master branch.

When deploying Lxcfs in a production cluster, consider:

The current release (3.1.2) does not virtualize /sys/devices/system/cpu/online out of the box; you must compile the master branch if you need that file.

CPU‑related files must be virtualized because runtimes like Java JVM and Nginx read them to determine thread counts.

Lxcfs must run on every Kubernetes node. If the Lxcfs service crashes, previously mounted /proc files become invalid; using a systemd unit with an ExecStartPost script can remount them automatically.

The lxcfs.service unit file:

[Unit]
Description=FUSE filesystem for LXC
ConditionVirtualization=!container
Before=lxc.service
Documentation=man:lxcfs(1)

[Service]
ExecStart=/usr/bin/lxcfs -l /var/lib/lxc/lxcfs/
KillMode=process
Restart=always
Delegate=yes
ExecStopPost=-/bin/fusermount -u /var/lib/lxc/lxcfs
ExecReload=/bin/kill -USR1 $MAINPID
# add remount script
ExecStartPost=/usr/local/bin/container_remount_lxcfs.sh

[Install]
WantedBy=multi-user.target

The remount script executed after a restart:

#! /bin/bash
PATH=$PATH:/bin
LXCFS="/var/lib/lxc/lxcfs"
LXCFS_ROOT_PATH="/var/lib/lxc"
containers=$(docker ps | grep -v pause | grep -v calico | awk '{print $1}' | grep -v CONTAINE)
for container in $containers; do
  mountpoint=$(docker inspect --format '{{ range .Mounts }}{{ if eq .Destination "/var/lib/lxc" }}{{ .Source }}{{ end }}{{ end }}' $container)
  if [ "$mountpoint" = "$LXCFS_ROOT_PATH" ]; then
    echo "remount $container"
    PID=$(docker inspect --format '{{.State.Pid}}' $container)
    for file in meminfo cpuinfo loadavg stat diskstats swaps uptime; do
      echo nsenter --target $PID --mount --  mount -B "$LXCFS/proc/$file" "/proc/$file"
      nsenter --target $PID --mount --  mount -B "$LXCFS/proc/$file" "/proc/$file"
    done
    for file in online; do
      echo nsenter --target $PID --mount --  mount -B "$LXCFS/sys/devices/system/cpu/$file" "/sys/devices/system/cpu/$file"
      nsenter --target $PID --mount --  mount -B "$LXCFS/sys/devices/system/cpu/$file" "/sys/devices/system/cpu/$file"
    done
  fi
done

Admission webhook integration

Using Kubernetes' extensible admission webhook mechanism, pod creation requests are intercepted; the webhook modifies the pod spec to mount Lxcfs’s virtualized proc (and optionally sys) filesystems before the pod is admitted, persisting the changes in etcd.

Key prerequisites for the webhook service:

Kubernetes version ≥ 1.9.

Enable MutatingAdmissionWebhook and ValidatingAdmissionWebhook in the --admission-control flag of the API server.

If the master node lacks kube-proxy, add --enable-aggregator-routing=true to the API server.

Upgrade runc on all nodes to allow mounting of procfs inside containers.

The author modified the open‑source lxcfs‑admission‑webhook project to also virtualize /sys/devices/system/cpu/online. The code is available at https://github.com/xigang/lxcfs-admission-webhook/tree/dev .

After deploying the webhook and Lxcfs, container‑level commands such as top and free correctly report the limits defined by cgroups, achieving true resource view isolation.

Kubernetes Admission Controllers

Extensible Admission Controllers

A Guide to Kubernetes Admission Controllers

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes cgroup LXCFS admission-webhook container isolation

Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.