Backend Development 7 min read

How Facebook Live Scales to Millions: Inside Its Backend Architecture

This article explains how Facebook Live handles millions of concurrent streams and viewers by using a multi‑layer edge cache system, request merging, and load balancing to achieve high‑availability, low‑latency video delivery at massive scale.

21CTO

May 8, 2017

How Facebook Live Scales to Millions: Inside Its Backend Architecture

Why Facebook Focuses on Live Video

Mark Zuckerberg announced that Facebook is shifting its video strategy toward live streaming, viewing it as a new golden era where most content will be video‑based.

Live video has already demonstrated massive engagement, such as a 45‑minute experiment that attracted 800,000 viewers and over 300,000 comments.

Scale and Challenges

Supporting millions of simultaneous live streams and millions of viewers per stream creates three main challenges:

Handling millions of concurrent live streams.

Serving millions of viewers for each stream.

Managing sudden traffic spikes when a popular person starts a broadcast.

Facebook’s live‑video team grew from fewer than 12 members to over 150 engineers, all tasked with delivering a fault‑tolerant service for a user base of 1.5 billion.

Architecture Overview

Facebook uses a hierarchical caching architecture:

Edge Cache : Distributed globally, each edge cache forwards requests to a single Origin Server (many‑to‑one relationship).

Origin Server : Acts like another cache layer and forwards uncached requests to the Streaming Server.

When a user requests a video, the nearest Edge Cache is consulted first. If the content is cached, it is returned immediately. Otherwise, the request is passed to the Origin Server, which may retrieve the data from the Streaming Server, cache it, and then serve it back through the Edge Cache.

Request Leakage Problem

Because of high concurrency, about 1.8 % of requests leak to the Streaming Server, creating significant load at Facebook’s scale.

Solution: Request Merging

Facebook groups identical concurrent requests into a single request queue, allowing only one request to reach the Origin or Streaming Server (known as request merging). The response is cached and then distributed to all queued requests, dramatically reducing load.

proxy_cache_lock = on

Load Balancing Edge Caches

Edge Cache servers are load‑balanced based on both geographic distance and current load. If a nearby cache is handling 200,000 requests, the load balancer directs new users to a slightly farther cache with lighter traffic, ensuring optimal performance.

Compiled from: https://designingforscale.com/how-facebook-live-scales/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

load balancing Scalable Architecture request merging edge cache Facebook Live

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.