Cloud Native 17 min read

Mastering Process Resource Limits in Docker & Kubernetes: ulimit and cgroup Explained

This article explains how Linux ulimit and cgroup mechanisms can be used to restrict file descriptors, memory, and thread counts in containerized environments, compares Docker and Kubernetes configurations, presents experimental results on fd and thread limits, and offers practical recommendations for configuring default‑ulimits, pod‑max‑pids, and system limits.

MaGe Linux Operations

Sep 6, 2023

Mastering Process Resource Limits in Docker & Kubernetes: ulimit and cgroup Explained

Background

In Linux, the ulimit command limits resource usage of processes (file descriptors, thread count, memory, etc.). In containerized environments, similar limits are needed.

Limitation Methods

ulimit: Docker supports --ulimit at container start and can set default limits via default-ulimits in the daemon; Kubernetes does not currently support ulimit.

cgroup: Docker supports memory, CPU, PID limits; thread limits can be set with --pids-limit. Kubernetes can enable SupportPodPidsLimit in kubelet to set pod‑level PID limits.

/etc/security/limits.conf and sysctl.conf: ulimit affects the current user; permanent limits can be set in limits.conf, system‑wide limits in sysctl.conf.

Experiment Comparison

Environment

Local: Ubuntu 16.04.6 LTS, Docker 18.09.7, base image alpine:v3.9

Kubernetes: kubelet v1.10.11.1, Docker 18.09.6

ulimit

User‑level resource limits have soft and hard values.

soft: can be raised by the user but not beyond the hard limit

hard: only root can raise

Modification methods: temporary via ulimit command, permanent via /etc/security/limits.conf. The mechanism works through PAM modules loading pam_limits.so on login.

File Descriptor Limit

RLIMIT_NOFILE
    This specifies a value one greater than the maximum file descriptor number that can be opened by this process.
    Attempts to exceed this limit yield the error EMFILE.
    Since Linux 4.5, this limit also defines the maximum number of file descriptors that an unprivileged process may have "in flight" to other processes.

The nofile limit controls the maximum number of open files per process.

Set ulimit nofile soft 100 / hard 200; default start as root.

$ docker run -d --ulimit nofile=100:200 cr.d.xiaomi.net/containercloud/alpine:webtool top

Inside the container, ulimit -a shows nofile soft 100.

# ulimit -a
-f: unlimited
-t: unlimited
-d: unlimited
-s: 8192
-c: unlimited
-m: unlimited
-l: 64
-p: unlimited
-n: 100
-v: unlimited
-w: unlimited
-e: 0
-r: 0

Running ab with 90 concurrent HTTP requests creates 90 sockets and works.

# ab -n 1000000 -c 90 http://61.135.169.125:80/ &
# lsof | wc -l
108
# lsof | grep -c ab
94

With 100 concurrent requests, ab fails with "No file descriptors available".

# ab -n 1000000 -c 100 http://61.135.169.125:80/
socket: No file descriptors available (24)

Thread Limit

RLIMIT_NPROC
    This is a limit on the number of extant processes (threads) for the real user ID of the calling process.
    If the limit is reached, fork(2) fails with EAGAIN.
    The limit is not enforced for processes with CAP_SYS_ADMIN or CAP_SYS_RESOURCE.

The nproc limit applies per UID and is ineffective for the root account.

Container UID

All containers on the same host share the host kernel; Docker isolates PID, UTS, network, etc., via namespaces. User namespaces exist but are disabled by default.

$ docker run -d cr.d.xiaomi.net/containercloud/alpine:webtool top
# ps -ef | grep top
root 4096 4080 0 15:01 ? 00:00:01 top

Inside the container, UID 0 maps to host root, but capabilities differ, giving fewer privileges.

# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)
# su operator
$ id
uid=11(operator) gid=0(root) groups=0(root)
$ sleep 100 &
$ ps -ef | grep 'sleep 100'
app 19302 19297 0 16:39 pts/0 00:00:00 sleep 100

Verifying ulimit Under Different Users

Set ulimit nproc soft 10 / hard 20, default start as root.

$ docker run -d --ulimit nproc=10:20 cr.d.xiaomi.net/containercloud/alpine:webtool top
# ulimit -a
-p: processes 10
-n: file descriptors 1048576

Start 30 processes; then switch to operator and attempt to start more; the 11th fork fails with "Resource temporarily unavailable".

# su operator
for i in `seq 8`; do sleep 100 &; done
# sleep 100 &
sh: can't fork: Resource temporarily unavailable

Verifying ulimit Across Containers with Same UID

Set ulimit nproc soft 3 / hard 3 for user operator and start four containers; the fourth exits with error.

$ docker run -d --ulimit nproc=3:3 --name nproc1 -u operator ...
$ docker run -d --ulimit nproc=3:3 --name nproc2 -u operator ...
$ docker run -d --ulimit nproc=3:3 --name nproc3 -u operator ...
$ docker run -d --ulimit nproc=3:3 --name nproc4 -u operator ...
# docker ps -a | grep nproc
... nproc4 Exited (1)

Summary

ulimit limits total file descriptors per process and can be applied to all users.

ulimit limits thread count per UID; root is exempt.

In production, ulimit may cause fork failures when multiple containers share the same UID and a container leaks threads.

cgroup

cgroup isolates PIDs; configuring Docker --pids-limit or enabling SupportPodPidsLimit in kubelet with --pod-max-pids limits total PIDs per pod.

Docker: set --pids-limit at container start.

Kubelet: enable SupportPodPidsLimit=true and set --pod-max-pids=150.

# kubelet command line example
--feature-gates=SupportPodPidsLimit=true --pod-max-pids=150

Testing shows root can spawn 100 threads, while operator is capped at 150 threads, matching the cgroup limit.

# cat /sys/fs/cgroup/pids/.../pids.current
150
# cat /sys/fs/cgroup/pids/.../pids.max
150

cgroup PID limits effectively restrict thread count; Docker only supports per‑container limits, while kubelet can set node‑wide pod limits.

limits.conf / sysctl.conf

limits.conf

configures ulimit values; files in /etc/security/limit.d/ override it. sysctl.conf sets machine‑level limits; files in /etc/security/sysctl.d/ override it. Adding entries to /etc/sysctl.conf can adjust fs.file-max and kernel.pid_max.

# Inside container, attempting to modify sysctl fails (read‑only filesystem)
# echo "fs.file-max=5" >> /etc/sysctl.conf
# sysctl -p
sysctl: error setting key 'fs.file-max': Read-only file system

Running a privileged container shows that changes to /proc/sys/kernel/pid_max affect the host, indicating Docker’s isolation is not complete for sysctl.

# docker exec -it ... sh
# echo 50000 > /proc/sys/kernel/pid_max
# cat /proc/sys/kernel/pid_max
50000
# host also shows 50000

Conclusion: limits.conf can be used inside containers similarly to ulimit; sysctl changes are host‑wide.

Conclusion

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.