Why Google and Facebook Skip Docker: Lessons from Monolithic Repos and Layered Packaging
The article explains how Google and Facebook’s monolithic repositories and unified build systems let them avoid Docker images by using direct module transfer, tarballs, XAR files, and overlay filesystems, while highlighting the technical trade‑offs and challenges of layered caching in large‑scale clusters.
Background
All technical details mentioned can be found in open‑source projects and research papers. The author wrote this after trying to speed up a modified distributed PyTorch program on Facebook’s clusters, illustrating the knowledge required for industrial machine learning.
After graduating in 2007, the author worked at Google for three years, admiring the Borg distributed OS. Leaving Google in 2010, they awaited an open‑source version until Kubernetes appeared.
Kubernetes vs Borg Terminology
Kubernetes schedules containers (precisely “集装箱”) that run images , analogous to processes running programs. Borg never exposed containers or images, which raises the question why Kubernetes introduced them.
Monolithic Repository Insight
Both Google and Facebook use a monolithic repository with a unified build system (Google Blaze/Bazel, Facebook Buck). When code is stored in a single repo, a unified build system can directly sync changed modules to cluster nodes without creating Docker images, ZIP, tarball, RPM, or DEB packages.
Packaging Options
Tarball : Simple packaging by zipping files (e.g., {A,B,C}.py, {D,E,F}.so) into A.zip or A.tar.gz. Version numbers (e.g., A-953bc.zip) enable cache reuse.
XAR : Facebook’s XAR format wraps a SquashFS loopback image with a header. After building with Buck, A.xar contains the files and can be mounted with xarexec -m A-953bc.xar to obtain a temporary mount point. xarexec -m A-953bc.xar Multiple XAR files can be layered (e.g., A-953bc.xar → B-953bc.xar → D-953bc.xar …), but they cannot be mounted sequentially to the same point because each mount occupies the target directory.
Overlay Filesystem
Using fuse-overlayfs, several directories can be overlaid into one view:
fuse-overlayfs -o lowerdir="/tmp/A-953bc:/tmp/B-953bc:..." /packages/A-953bcThe lower directories are the mount points of the XAR files, effectively making each XAR a layer.
Docker Image and Layers
Docker images consist of multiple layers stored via an overlay filesystem. When pulling an image, cached layers are skipped, saving bandwidth. Docker’s overlayfs (or overlayfs2) runs in kernel mode, requiring root privileges, which raises security concerns.
FUSE‑based overlayfs (e.g., fuse-overlayfs) can be used as an alternative, though with lower performance.
Why Google and Facebook Do Not Use Docker
Because their monolithic repos and build systems can directly transfer compiled modules, they do not need packaging concepts like Docker images. Historically, they built fully static binaries, eliminating the need for .so libraries or containers.
Languages such as Java (using JAR), Python (using PAR/subpar), and Go (static linking) fit this model. However, static linking leads to large binaries and longer rebuild times, so most other companies prefer layered Docker images for cache efficiency.
Technical Challenge of a Perfect Solution
A perfect solution should support layered or chunked caching while handling the granularity mismatch between build‑system modules and higher‑level projects. For C/C++, many .so files would create too many layers, slowing startup due to symbol resolution.
One approach is graph partitioning: combine modules into sub‑graphs, compile each sub‑graph into a static archive ( .a), link them into a single shared library ( .so) per sub‑graph, and use those as cache units.
References
https://engineering.fb.com/2019/06/06/data-center-engineering/twine/
https://zhuanlan.zhihu.com/p/55452964
https://bazel.build/
https://buck.build/
https://github.com/facebookincubator/xar
https://tldp.org/HOWTO/SquashFS-HOWTO/creatingandusing.html
https://docs.docker.com/storage/storagedriver/select-storage-driver/
https://github.com/google/subpar
Original source: https://zhuanlan.zhihu.com/p/368676698
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
