How to Securely Run Business Containers as Non‑Root: Practical Docker & Kubernetes Techniques
This article shares practical experience on converting business containers to run without root privileges, covering the importance of non‑root execution, essential security concepts, Dockerfile USER settings, entrypoint scripts, handling machine‑UUID access, and concrete examples for CoreDNS, Consul, MySQL, Redis, etc.
This article summarizes practical experience of converting business containers to non‑root startup, emphasizing its importance and basic knowledge.
Origin
Customer security requirements mandate that business containers run without root, many needing ipset or iptables operations, which cannot be solved by pure rootless Docker. The goal is to make all business container processes non‑root.
Previous article "Container is fast but not safe, Rootless is the answer" introduced the risks of running Docker as root.
Transformation
Prerequisite Knowledge
Basic concepts include why using root is unsafe and examples of root risks.
Examples of Root Insecurity
Although Linux provides user namespaces, Docker does not support per‑container UID mapping like Podman, and containers can still modify mounted files, e.g., accidental rm -rf * deletions.
docker run --rm -v /mnt/sda1:/mnt/sda1 -it alpine
cp /mnt/sda1/somefile.tar.gz .
tar xzvf somefile.tar.gz
cd somefile-v1.0
ls
# ...
cd ..
rm -rf *Alpine's default workdir is /, so rm -rf /* would delete everything. Business containers must run processes with minimal privileges.
Choosing USER vs docker‑entrypoint.sh
Set USER in Dockerfile or use -u user:group at run time for simple processes (e.g., exporters). Examples include:
danielqsj/kafka_exporter
ClickHouse/clickhouse_exporter
kubernetes addonresizer
For containers with persistent data directories (e.g., MySQL, Redis), simply setting USER is insufficient; directory permissions must be adjusted before container start, matching UID/GID with the host.
Direct -v mount or Docker volume
K8s hostPath
Fixed PV
PVC under a StorageClass
Deploying on another K8s cluster
Modifying directory permissions ahead of time can break automation, especially when upstream images change UID/GID. Therefore, entrypoint scripts are preferred.
MySQL official image creates a dedicated mysql user with specific UID/GID and starts with ENTRYPOINT CMD (or K8s command / args). docker-entrypoint.sh mysqld Redis example entrypoint script (simplified):
#!/bin/sh
set -e
if [ "${1#-}" != "$1" ] || [ "${1%.conf}" != "$1" ]; then
set -- redis-server "$@"
fi
if [ "$1" = 'redis-server' -a "$(id -u)" = '0' ]; then
find . \! -user redis -exec chown redis '{}' +
exec gosu redis "$0" "$@"
fi
# set appropriate umask
um="$(umask)"
if [ "$um" = '0022' ]; then
umask 0077
fi
exec "$@"Running docker top shows the container's PID with the host UID, and using gosu (or su‑exec) allows switching to a non‑root user while preserving signal handling.
Case Studies
Key practices:
Place PID and socket files under /tmp Grant write permission to /dev/std* if needed
Use fixed uid:gid for user creation to match host images
Avoid chmod -R 777 on directories
Machine‑UUID Handling
Reading the machine UUID via dmidecode -s system-uuid fails in containers without root. Instead, read /sys/firmware/dmi/tables/DMI after granting read permission, or use a Go library to parse SMBIOS data.
package main
import (
"fmt"
"log"
"github.com/digitalocean/go-smbios/smbios"
)
func main() {
rc, _, err := smbios.Stream()
if err != nil { log.Fatalf("failed to open stream: %v", err) }
defer rc.Close()
d := smbios.NewDecoder(rc)
ss, err := d.Decode()
if err != nil { log.Fatalf("failed to decode structures: %v", err) }
for _, s := range ss {
if s.Header.Type == 1 {
fmt.Printf("UUID: %X%X%X%X-%X%X-%X%X-%X%X-%X%X%X%X%X%X
",
s.Formatted[7], s.Formatted[6], s.Formatted[5], s.Formatted[4],
s.Formatted[9], s.Formatted[8], s.Formatted[11], s.Formatted[10],
s.Formatted[12], s.Formatted[13], s.Formatted[14], s.Formatted[15],
s.Formatted[16], s.Formatted[17], s.Formatted[18], s.Formatted[19])
}
}
}Mount the host /sys/firmware/dmi/tables into the container and adjust permissions before invoking the Go binary.
CoreDNS
CoreDNS 1.11.0 supports non‑root, but the project uses 1.10.1. A custom image adds a non‑root user and sets the capability cap_net_bind_service to allow binding port 53.
ARG DEBIAN_IMAGE=debian:stable-slim
ARG BASE=gcr.io/distroless/static-debian12:nonroot
FROM coredns/coredns:1.10.1 as bin
FROM ${DEBIAN_IMAGE} AS build
SHELL ["/bin/sh", "-ec"]
RUN export DEBCONF_NONINTERACTIVE_SEEN=true DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical TERM=linux \
&& apt-get -qq update && apt-get -yyqq upgrade && apt-get -yyqq install ca-certificates libcap2-bin && apt-get clean
COPY --from=bin /coredns /coredns
RUN setcap cap_net_bind_service=+ep /coredns
FROM ${BASE}
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build /coredns /coredns
USER nonroot:nonroot
EXPOSE 53 53/udp
ENTRYPOINT ["/coredns"]Building with BuildKit preserves the capability; otherwise the binary cannot bind to port 53.
Consul
The official Consul image runs as root; to avoid root, modify the entrypoint to use chown -R and drop dumb‑init so the PID 1 process runs as the non‑root user.
ARG VER=1.8.3
FROM consul:${VER}
RUN sed -ri -e 's/(chown)(\s+consul:)/\1 -R\2/' \
-e '1s@/usr/bin/dumb-init\s+@@' /usr/local/bin/docker-entrypoint.shDocker Socket Access
Processes needing the host Docker socket (e.g., cAdvisor) must run as a user that belongs to the socket's group. The entrypoint script adds the user to the appropriate group and then execs the target binary via gosu or su‑exec.
#!/bin/sh
set -e
[ -z "$D_SOCK" ] && D_SOCK=/var/run/docker.sock
if [ "${1:0:1}" = '-' ]; then
set -- cadvisor "$@"
fi
if [ "$1" = 'cadvisor' ]; then
if [ "$(id -u)" = '0' -a -n "$RUN_USER" ]; then
if [ -S "$D_SOCK" ]; then
group_id=$(stat -c "%g" "$D_SOCK")
getent group | cut -d: -f3 | grep -qw $group_id || addgroup -g $group_id docker
group_name=$(stat -c "%G" "$D_SOCK")
id -nG "$RUN_USER" | grep -qw $group_name || adduser $RUN_USER $group_name
fi
exec gosu $RUN_USER "$@"
fi
fi
exec "$@"Cron Replacement
Non‑root containers cannot use the traditional cron daemon; instead, go‑crond can be used.
References
V2EX discussion on dangerous rm -rf * GitHub PRs for kafka_exporter, clickhouse_exporter, addonresizer
MySQL Docker entrypoint script
gosu and su‑exec projects
Kernel source for DMI permission
Wurstmeister Kafka Docker
go‑crond project
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
