Operations 15 min read

30× Faster OCI Container Startup: Kernel and Seccomp Tweaks Explained

By profiling the OCI runtime with hyperfine and applying a series of kernel patches, seccomp hash optimizations, and BPF compilation caching, the author reduced container start‑up time from roughly 160 ms in 2017 to just over 5 ms today—a near 30‑fold speedup.

System Architect Go
System Architect Go
System Architect Go
30× Faster OCI Container Startup: Kernel and Seccomp Tweaks Explained

Background and Goal

The author, a Red Hat engineer, set out to accelerate OCI container start‑up by improving the OCI runtime, which is responsible for the final kernel interactions that establish a container's environment.

2017 Baseline Measurements

Using hyperfine on Fedora 24 (kernel 4.5.5), the baseline benchmark for crun run foo was:

# hyperfine 'crun run foo'
Benchmark 1: 'crun run foo'
  Time (mean ± σ):     159.2 ms ±  21.8 ms    [User: 43.0 ms, System: 16.3 ms]
  Range (min … max):    73.9 ms … 194.9 ms    39 runs

Running the same command without a seccomp profile cut the user‑time to 4.1 ms and the overall time to 139.6 ms, revealing two major cost centers: long system time and libseccomp‑driven user time.

System‑Time Reductions

Network Namespace Creation/Destruction

Originally, creating and destroying a network namespace took ~48 ms:

# hyperfine 'unshare -n true'
Benchmark 1: 'unshare -n true'
  Time (mean ± σ):      47.7 ms ±  51.4 ms    [User: 0.6 ms, System: 3.2 ms]
  Range (min … max):      0.0 ms … 190.5 ms    365 runs

Kernel patches by Florian Westphal (merged in Linux 5.19) replaced costly synchronize_net() calls with call_rcu(), dropping the time to about 1.5 ms:

# hyperfine 'unshare -n true'
Benchmark 1: 'unshare -n true'
  Time (mean ± σ):       1.5 ms ±   0.5 ms    [User: 0.3 ms, System: 1.3 ms]
  Range (min … max):     0.8 ms …   6.7 ms    1907 runs

mqueue Mount

Mounting mqueue originally cost ~16.8 ms. A patch by Al Viro introduced on‑demand mount creation, reducing the benchmark to ~0.7 ms:

# hyperfine 'unshare --propagation=private -m mount -t mqueue mqueue /tmp/mqueue'
Benchmark 1: 'unshare --propagation=private -m mount -t mqueue mqueue /tmp/mqueue'
  Time (mean ± σ):       0.7 ms ±   0.5 ms    [User: 0.5 ms, System: 0.6 ms]
  Range (min … max):     0.0 ms …   3.1 ms    772 runs

IPC Namespace Creation/Destruction

Initial IPC namespace benchmark was ~10.9 ms. A patch accepted in 2020 (by Giuseppe Scrivano) used a work‑queue to free IPC resources, bringing the time down to ~0.1 ms on a modern kernel:

# hyperfine 'unshare -i true'
Benchmark 1: 'unshare -i true'
  Time (mean ± σ):       0.1 ms ±   0.2 ms    [User: 0.2 ms, System: 0.4 ms]
  Range (min … max):     0.0 ms …   1.5 ms    1966 runs

User‑Time Optimizations (libseccomp)

The majority of user‑time was spent in libseccomp resolving syscall names via a linear strcmp search. The original complexity was O(n·m) (n = number of syscalls in the profile, m = total known syscalls).

A patch introduced in January 2020 replaced the linear search with a perfect hash generated by gperf, reducing the lookup complexity to O(n). Benchmarks after the change showed:

# hyperfine 'crun run foo'
Benchmark 1: 'crun run foo'
  Time (mean ± σ):      28.9 ms ±   5.9 ms    [User: 16.7 ms, System: 4.5 ms]
  Range (min … max):    19.1 ms …  41.6 ms    73 runs

Compared to the no‑seccomp baseline of 4.1 ms user time, the seccomp overhead dropped from ~38.9 ms to ~12.6 ms, a three‑fold improvement.

BPF Filter Compilation Caching

Compiling the BPF filter via seccomp_export_bpf remained expensive. Since most containers reuse the same seccomp profile, caching compiled filters and reusing them when possible cuts this cost dramatically. An experimental runtime feature (not yet merged) demonstrated a total run time of ~5.6 ms for the same container command.

Conclusion

Over five years, cumulative kernel, seccomp, and runtime improvements have reduced the total OCI container creation and destruction time from nearly 160 ms to just over 5 ms—a roughly 30× speedup.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance optimizationLinux kernelBPFContainer RuntimeseccompOCI
System Architect Go
Written by

System Architect Go

Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.