How Facebook Live Scales to Millions: Inside Its Backend Architecture
This article explains how Facebook Live handles millions of concurrent streams and viewers by using a multi‑layer edge cache system, request merging, and load balancing to achieve high‑availability, low‑latency video delivery at massive scale.
Why Facebook Focuses on Live Video
Mark Zuckerberg announced that Facebook is shifting its video strategy toward live streaming, viewing it as a new golden era where most content will be video‑based.
Live video has already demonstrated massive engagement, such as a 45‑minute experiment that attracted 800,000 viewers and over 300,000 comments.
Scale and Challenges
Supporting millions of simultaneous live streams and millions of viewers per stream creates three main challenges:
Handling millions of concurrent live streams.
Serving millions of viewers for each stream.
Managing sudden traffic spikes when a popular person starts a broadcast.
Facebook’s live‑video team grew from fewer than 12 members to over 150 engineers, all tasked with delivering a fault‑tolerant service for a user base of 1.5 billion.
Architecture Overview
Facebook uses a hierarchical caching architecture:
Edge Cache : Distributed globally, each edge cache forwards requests to a single Origin Server (many‑to‑one relationship).
Origin Server : Acts like another cache layer and forwards uncached requests to the Streaming Server.
When a user requests a video, the nearest Edge Cache is consulted first. If the content is cached, it is returned immediately. Otherwise, the request is passed to the Origin Server, which may retrieve the data from the Streaming Server, cache it, and then serve it back through the Edge Cache.
Request Leakage Problem
Because of high concurrency, about 1.8 % of requests leak to the Streaming Server, creating significant load at Facebook’s scale.
Solution: Request Merging
Facebook groups identical concurrent requests into a single request queue, allowing only one request to reach the Origin or Streaming Server (known as request merging). The response is cached and then distributed to all queued requests, dramatically reducing load.
proxy_cache_lock = onLoad Balancing Edge Caches
Edge Cache servers are load‑balanced based on both geographic distance and current load. If a nearby cache is handling 200,000 requests, the load balancer directs new users to a slightly farther cache with lighter traffic, ensuring optimal performance.
Compiled from: https://designingforscale.com/how-facebook-live-scales/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
