Mastering Process Resource Limits in Docker & Kubernetes: ulimit and cgroup Explained
This article explains how Linux ulimit and cgroup mechanisms can be used to restrict file descriptors, memory, and thread counts in containerized environments, compares Docker and Kubernetes configurations, presents experimental results on fd and thread limits, and offers practical recommendations for configuring default‑ulimits, pod‑max‑pids, and system limits.
Background
In Linux, the ulimit command limits resource usage of processes (file descriptors, thread count, memory, etc.). In containerized environments, similar limits are needed.
Limitation Methods
ulimit: Docker supports --ulimit at container start and can set default limits via default-ulimits in the daemon; Kubernetes does not currently support ulimit.
cgroup: Docker supports memory, CPU, PID limits; thread limits can be set with --pids-limit. Kubernetes can enable SupportPodPidsLimit in kubelet to set pod‑level PID limits.
/etc/security/limits.conf and sysctl.conf: ulimit affects the current user; permanent limits can be set in limits.conf, system‑wide limits in sysctl.conf.
Experiment Comparison
Environment
Local: Ubuntu 16.04.6 LTS, Docker 18.09.7, base image alpine:v3.9
Kubernetes: kubelet v1.10.11.1, Docker 18.09.6
ulimit
User‑level resource limits have soft and hard values.
soft: can be raised by the user but not beyond the hard limit
hard: only root can raise
Modification methods: temporary via ulimit command, permanent via /etc/security/limits.conf. The mechanism works through PAM modules loading pam_limits.so on login.
File Descriptor Limit
RLIMIT_NOFILE
This specifies a value one greater than the maximum file descriptor number that can be opened by this process.
Attempts to exceed this limit yield the error EMFILE.
Since Linux 4.5, this limit also defines the maximum number of file descriptors that an unprivileged process may have "in flight" to other processes.The nofile limit controls the maximum number of open files per process.
Set ulimit nofile soft 100 / hard 200; default start as root.
$ docker run -d --ulimit nofile=100:200 cr.d.xiaomi.net/containercloud/alpine:webtool topInside the container, ulimit -a shows nofile soft 100.
# ulimit -a
-f: unlimited
-t: unlimited
-d: unlimited
-s: 8192
-c: unlimited
-m: unlimited
-l: 64
-p: unlimited
-n: 100
-v: unlimited
-w: unlimited
-e: 0
-r: 0Running ab with 90 concurrent HTTP requests creates 90 sockets and works.
# ab -n 1000000 -c 90 http://61.135.169.125:80/ &
# lsof | wc -l
108
# lsof | grep -c ab
94With 100 concurrent requests, ab fails with "No file descriptors available".
# ab -n 1000000 -c 100 http://61.135.169.125:80/
socket: No file descriptors available (24)Thread Limit
RLIMIT_NPROC
This is a limit on the number of extant processes (threads) for the real user ID of the calling process.
If the limit is reached, fork(2) fails with EAGAIN.
The limit is not enforced for processes with CAP_SYS_ADMIN or CAP_SYS_RESOURCE.The nproc limit applies per UID and is ineffective for the root account.
Container UID
All containers on the same host share the host kernel; Docker isolates PID, UTS, network, etc., via namespaces. User namespaces exist but are disabled by default.
$ docker run -d cr.d.xiaomi.net/containercloud/alpine:webtool top
# ps -ef | grep top
root 4096 4080 0 15:01 ? 00:00:01 topInside the container, UID 0 maps to host root, but capabilities differ, giving fewer privileges.
# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)
# su operator
$ id
uid=11(operator) gid=0(root) groups=0(root)
$ sleep 100 &
$ ps -ef | grep 'sleep 100'
app 19302 19297 0 16:39 pts/0 00:00:00 sleep 100Verifying ulimit Under Different Users
Set ulimit nproc soft 10 / hard 20, default start as root.
$ docker run -d --ulimit nproc=10:20 cr.d.xiaomi.net/containercloud/alpine:webtool top
# ulimit -a
-p: processes 10
-n: file descriptors 1048576Start 30 processes; then switch to operator and attempt to start more; the 11th fork fails with "Resource temporarily unavailable".
# su operator
for i in `seq 8`; do sleep 100 &; done
# sleep 100 &
sh: can't fork: Resource temporarily unavailableVerifying ulimit Across Containers with Same UID
Set ulimit nproc soft 3 / hard 3 for user operator and start four containers; the fourth exits with error.
$ docker run -d --ulimit nproc=3:3 --name nproc1 -u operator ...
$ docker run -d --ulimit nproc=3:3 --name nproc2 -u operator ...
$ docker run -d --ulimit nproc=3:3 --name nproc3 -u operator ...
$ docker run -d --ulimit nproc=3:3 --name nproc4 -u operator ...
# docker ps -a | grep nproc
... nproc4 Exited (1)Summary
ulimit limits total file descriptors per process and can be applied to all users.
ulimit limits thread count per UID; root is exempt.
In production, ulimit may cause fork failures when multiple containers share the same UID and a container leaks threads.
cgroup
cgroup isolates PIDs; configuring Docker --pids-limit or enabling SupportPodPidsLimit in kubelet with --pod-max-pids limits total PIDs per pod.
Docker: set --pids-limit at container start.
Kubelet: enable SupportPodPidsLimit=true and set --pod-max-pids=150.
# kubelet command line example
--feature-gates=SupportPodPidsLimit=true --pod-max-pids=150Testing shows root can spawn 100 threads, while operator is capped at 150 threads, matching the cgroup limit.
# cat /sys/fs/cgroup/pids/.../pids.current
150
# cat /sys/fs/cgroup/pids/.../pids.max
150cgroup PID limits effectively restrict thread count; Docker only supports per‑container limits, while kubelet can set node‑wide pod limits.
limits.conf / sysctl.conf
limits.confconfigures ulimit values; files in /etc/security/limit.d/ override it. sysctl.conf sets machine‑level limits; files in /etc/security/sysctl.d/ override it. Adding entries to /etc/sysctl.conf can adjust fs.file-max and kernel.pid_max.
# Inside container, attempting to modify sysctl fails (read‑only filesystem)
# echo "fs.file-max=5" >> /etc/sysctl.conf
# sysctl -p
sysctl: error setting key 'fs.file-max': Read-only file systemRunning a privileged container shows that changes to /proc/sys/kernel/pid_max affect the host, indicating Docker’s isolation is not complete for sysctl.
# docker exec -it ... sh
# echo 50000 > /proc/sys/kernel/pid_max
# cat /proc/sys/kernel/pid_max
50000
# host also shows 50000Conclusion: limits.conf can be used inside containers similarly to ulimit; sysctl changes are host‑wide.
Conclusion
Recommended solutions:
FD limit: modify Docker daemon default-ulimits to restrict process‑level file descriptors.
Thread limit: configure kubelet with --feature-gates=SupportPodPidsLimit=true and --pod-max-pids to limit PIDs via cgroup.
Other considerations: adjust node pid.max, relax nproc limits for non‑root users in images.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
