Why Google and Facebook Skip Docker: Monolithic Repos, Build Systems, and Layered Packaging
The article explains how Google and Facebook avoid Docker images by using monolithic repositories and unified build systems, detailing alternative packaging methods like tarballs, XAR, overlay filesystems, and the challenges of achieving efficient layered caching without containers.
Background : The author, a former Google engineer now at Facebook, wanted to speed up distributed PyTorch job startup on Facebook's clusters and reflected on why large tech companies use monolithic repositories and custom build systems instead of Docker.
One‑sentence Summary
If you manage code in a monolithic repository with a unified build system, you can avoid Docker images entirely by directly packaging build artifacts as tarballs, XAR files, or overlay filesystem layers.
Monolithic Repository vs. Packaged Images
In a monolithic repo all projects live in a single (or very few) repositories with a shared build system (Google Blaze, Facebook Buck). When a module changes, the build system can sync only the changed artifact (e.g., a .so or .jar) to the target node, eliminating the need for Docker images, ZIPs, or other package formats.
Packaging Options
Tarball : Simple .zip or .tar.gz containing all required files, optionally versioned with a commit hash (e.g., A-953bc.zip).
XAR : Facebook’s XAR format combines a header with a squashfs loopback image; each module can be built into its own .xar (e.g., A-953bc.xar).
Overlay Filesystem : Using fuse‑overlayfs to stack multiple module directories, mimicking Docker’s layered approach without Docker.
Layered Distribution
When only A.py changes, rebuild only A.xar and reuse cached layers B‑F.xar. Directly mounting these XAR layers via xarexec -m provides the combined filesystem view.
Docker and OverlayFS
Docker images are essentially layered overlayfs images; Docker’s cache works by reusing unchanged layers. The article shows how the same effect can be achieved with XAR + overlayfs, and notes that Docker’s reliance on kernel‑mode drivers (overlayfs, btrfs) can raise security concerns.
Why Google/Facebook Don’t Use Docker
Because their monolithic repos and build tools can directly define modules as layers, they can skip the packaging step entirely—transferring only the rebuilt modules. This reduces build‑to‑run latency but sacrifices the generic layering benefits Docker provides.
Technical Challenges
Fine‑grained modules ( .so files) can lead to too many layers, causing long symbol resolution times. A proposed solution is graph partitioning: group modules into sub‑graphs, compile each sub‑graph into a static archive ( .a), then link those into a single shared library ( .so) to serve as a cache unit.
Conclusion
Monolithic repositories with unified build systems can eliminate Docker images, but achieving efficient layered caching still requires careful module grouping or adopting Docker‑like technologies (e.g., btrfs, overlayfs) when necessary.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
