Scalable and Reliable Configuration Distribution at Facebook
This article explains how Facebook’s Configerator system achieves scalable, reliable configuration distribution using a push model, a hierarchical Zeus tree, Package Vessel for large data, and multi‑repo Git strategies to improve commit throughput and fault tolerance.
In the previous article we introduced Facebook’s approach of treating configuration items as code. This chapter discusses the challenges of distributing massive configuration data at scale and Facebook’s solutions.
1. Scalable and Reliable Configuration Distribution
Configerator pushes configuration changes to millions of servers across continents, where failures are common. Beyond scalability and reliability, key properties include availability (applications keep running despite config tool failures) and data consistency (all instances receive changes in the same order).
Configerator uses a push model: a Git Tailer continuously reads changes from a Git repository (the single source of truth) and writes them to Zeus, a fork of ZooKeeper optimized for Facebook’s environment. Zeus runs a three‑level high‑fan‑out distribution tree (Leader → Observer → Proxy) to push updates efficiently.
For very large data, Configerator employs a peer‑to‑peer (P2P) protocol via the Package Vessel tool, separating large payloads from metadata and using BitTorrent‑style distribution to avoid overloading central storage.
2. Push vs. Pull Model
The pull model is simple and stateless but incurs polling overhead and does not scale well with thousands of configuration items per server. Facebook therefore adopts a push model to reduce latency and bandwidth consumption.
3. Improving Commit Throughput
Concurrent engineers pushing to a single shared Git repository cause contention. Facebook introduced a “Landing Strip” component that serializes diffs, allowing multiple smaller Git repositories to provide a partitioned global namespace, increasing commit throughput.
4. Fault Tolerance
Each component (Git stores, Zeus, observers, proxies) is replicated across regions with a master‑backup setup. Failover mechanisms ensure that if any part fails, configuration data remains available from disk caches.
5. Summary
Configerator solves key challenges of configuration authoring, error prevention, and distribution by treating configuration as code, using dependency‑driven modules, employing a push‑based distribution tree for small data, a P2P protocol for large data, delegating commits to a Landing Strip, and partitioning Git repositories for higher throughput.
Continuous Delivery 2.0
Tech and case studies on organizational management, team management, and engineering efficiency
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.