30× Faster OCI Container Startup: Kernel and Seccomp Tweaks Explained
By profiling the OCI runtime with hyperfine and applying a series of kernel patches, seccomp hash optimizations, and BPF compilation caching, the author reduced container start‑up time from roughly 160 ms in 2017 to just over 5 ms today—a near 30‑fold speedup.
Background and Goal
The author, a Red Hat engineer, set out to accelerate OCI container start‑up by improving the OCI runtime, which is responsible for the final kernel interactions that establish a container's environment.
2017 Baseline Measurements
Using hyperfine on Fedora 24 (kernel 4.5.5), the baseline benchmark for crun run foo was:
# hyperfine 'crun run foo'
Benchmark 1: 'crun run foo'
Time (mean ± σ): 159.2 ms ± 21.8 ms [User: 43.0 ms, System: 16.3 ms]
Range (min … max): 73.9 ms … 194.9 ms 39 runsRunning the same command without a seccomp profile cut the user‑time to 4.1 ms and the overall time to 139.6 ms, revealing two major cost centers: long system time and libseccomp‑driven user time.
System‑Time Reductions
Network Namespace Creation/Destruction
Originally, creating and destroying a network namespace took ~48 ms:
# hyperfine 'unshare -n true'
Benchmark 1: 'unshare -n true'
Time (mean ± σ): 47.7 ms ± 51.4 ms [User: 0.6 ms, System: 3.2 ms]
Range (min … max): 0.0 ms … 190.5 ms 365 runsKernel patches by Florian Westphal (merged in Linux 5.19) replaced costly synchronize_net() calls with call_rcu(), dropping the time to about 1.5 ms:
# hyperfine 'unshare -n true'
Benchmark 1: 'unshare -n true'
Time (mean ± σ): 1.5 ms ± 0.5 ms [User: 0.3 ms, System: 1.3 ms]
Range (min … max): 0.8 ms … 6.7 ms 1907 runsmqueue Mount
Mounting mqueue originally cost ~16.8 ms. A patch by Al Viro introduced on‑demand mount creation, reducing the benchmark to ~0.7 ms:
# hyperfine 'unshare --propagation=private -m mount -t mqueue mqueue /tmp/mqueue'
Benchmark 1: 'unshare --propagation=private -m mount -t mqueue mqueue /tmp/mqueue'
Time (mean ± σ): 0.7 ms ± 0.5 ms [User: 0.5 ms, System: 0.6 ms]
Range (min … max): 0.0 ms … 3.1 ms 772 runsIPC Namespace Creation/Destruction
Initial IPC namespace benchmark was ~10.9 ms. A patch accepted in 2020 (by Giuseppe Scrivano) used a work‑queue to free IPC resources, bringing the time down to ~0.1 ms on a modern kernel:
# hyperfine 'unshare -i true'
Benchmark 1: 'unshare -i true'
Time (mean ± σ): 0.1 ms ± 0.2 ms [User: 0.2 ms, System: 0.4 ms]
Range (min … max): 0.0 ms … 1.5 ms 1966 runsUser‑Time Optimizations (libseccomp)
The majority of user‑time was spent in libseccomp resolving syscall names via a linear strcmp search. The original complexity was O(n·m) (n = number of syscalls in the profile, m = total known syscalls).
A patch introduced in January 2020 replaced the linear search with a perfect hash generated by gperf, reducing the lookup complexity to O(n). Benchmarks after the change showed:
# hyperfine 'crun run foo'
Benchmark 1: 'crun run foo'
Time (mean ± σ): 28.9 ms ± 5.9 ms [User: 16.7 ms, System: 4.5 ms]
Range (min … max): 19.1 ms … 41.6 ms 73 runsCompared to the no‑seccomp baseline of 4.1 ms user time, the seccomp overhead dropped from ~38.9 ms to ~12.6 ms, a three‑fold improvement.
BPF Filter Compilation Caching
Compiling the BPF filter via seccomp_export_bpf remained expensive. Since most containers reuse the same seccomp profile, caching compiled filters and reusing them when possible cuts this cost dramatically. An experimental runtime feature (not yet merged) demonstrated a total run time of ~5.6 ms for the same container command.
Conclusion
Over five years, cumulative kernel, seccomp, and runtime improvements have reduced the total OCI container creation and destruction time from nearly 160 ms to just over 5 ms—a roughly 30× speedup.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
