Operations 7 min read

Scaling Facebook Live: Edge Cache Architecture & Request Merging

Facebook Live must support millions of concurrent streams and viewers, so its architecture uses a multi‑layer edge‑cache system, request merging, and load balancing to filter traffic, reduce streaming server load, and prevent request leakage, ensuring high performance during traffic spikes.

Java High-Performance Architecture

May 7, 2017

Scaling Facebook Live: Edge Cache Architecture & Request Merging

Challenge

By the end of 2016, Facebook had 18.6 million monthly active users, putting huge pressure on Facebook Live with massive numbers of concurrent streams and viewers.

The main challenges are:

Supporting millions of live streams simultaneously.

Supporting millions of viewers for each stream.

Live streaming also exhibits sharp traffic spikes, e.g., when a celebrity goes live, a huge influx of users creates a massive peak.

Architecture

When many requests arrive together, a thundering‑herd effect can cause severe latency, packet loss, and new users failing to connect.

To mitigate this, requests are first filtered by a multi‑layer structure so that only necessary traffic reaches the streaming servers.

Edge Cache servers are distributed globally and have a many‑to‑one relationship with the Origin Server; multiple Edge Caches can request from the same Origin Server.

Workflow:

User requests first reach the nearest Edge Cache, which acts as a simple cache layer.

If the requested data is in the Edge Cache, it is returned directly to the user.

If not, the request is forwarded to the Origin Server, which also functions as a cache.

If the Origin Server holds the data, it returns it to the Edge Cache, which then serves the user and caches a copy.

If the Origin Server lacks the data, the request goes to the Streaming Server; the data returns to the Origin Server, then to the Edge Cache, and finally to the user, with both the Origin and Edge caches storing the data.

Subsequent identical requests are handled efficiently by the Edge Cache and Origin Server.

This architecture dramatically reduces the number of requests that reach the Streaming Server. For example, if five requests arrive at an Edge Cache, only the first traverses the full path to the Streaming Server; the remaining four are served from the Edge Cache.

How to Prevent Request Leakage?

Despite its effectiveness, about 1.8% of requests still leak to the Streaming Server, which is significant at Facebook’s scale.

Leakage Causes

Leakage occurs under high concurrency: multiple requests arriving at an Edge Cache for the same data packet that is not cached cause all of them to be forwarded to the Origin Server.

Similarly, if several Edge Caches simultaneously request the same data from an Origin Server, the request can cascade to the Streaming Server.

Thus, Facebook’s massive concurrency leads to substantial request leakage.

Solution

Facebook’s approach is simple: when multiple requests arrive at an Edge Cache for the same data packet, they are grouped into a single request queue, and only one request is sent to the Origin Server (request merging). The response is cached and then returned to all queued requests.

The same merging strategy is applied at the Origin Server level—if multiple Edge Caches request the same data concurrently, only one reaches the Streaming Server.

This effectively resolves the concurrency‑induced leakage problem.

In Nginx, request merging can be enabled with:

proxy_cache_lock = on

Load Balancing

Another crucial aspect is load balancing the Edge Cache servers.

During traffic peaks, an Edge Cache may be handling hundreds of thousands of requests. A load balancer distributes incoming requests to less‑loaded Edge Caches, possibly slightly farther away, to balance distance and server pressure.

The load balancer makes a comprehensive decision based on proximity and current load, directing users to the most suitable Edge Cache server.

Summary

This article, translated from designingforscale.com , highlights a core strategy: using a multi‑layer architecture to filter requests, thereby improving server performance and availability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

live streaming load balancing scalable architecture request merging edge cache

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.