How Weibo Handles Billion‑Scale Short Video Traffic: High‑Concurrency Architecture Deep Dive
This article explains how Weibo's video team designs a highly available, high‑concurrency architecture for short‑video services, covering team responsibilities, business scenarios, microservice design, caching layers, multi‑data‑center HA, and circuit‑breaker mechanisms to sustain unpredictable traffic spikes.
Team Introduction
We are a technical team within Weibo's R&D video platform, responsible for core video services such as video posts, "Weibo Stories", short videos, and live streaming, as well as the underlying video platform infrastructure (file platform, transcoding, scheduling, media library).
Our goal is to enable Weibo to handle millions of daily video increments and diverse custom requirements.
Business Scenario
The video service must cope with sudden traffic surges caused by hot events (e.g., celebrity rumors, breaking news). Simple server scaling is insufficient because over‑provisioning wastes resources during low traffic, while under‑provisioning risks crashes during spikes.
These surges are unpredictable, unlike scheduled high‑traffic events like "Double 11".
"Weibo Stories" Architecture Design
The service is built as a microservice system. The interface layer mixes Web API and internal RPC calls. A façade layer aggregates several vertical microservices, each exposing specific functionality, while dependent services (e.g., user follow) are accessed via RPC from other departments. The storage layer combines cache and database.
Technical Challenges
Estimating the QPS for a typical scenario (500 followed friends, 100 k homepage refreshes per second) yields 50 million requests per second, not counting expiration checks and ordering, far exceeding naive designs.
Solution Comparison
We considered two Feed models:
Feed Push Model : pushes each new video to every follower, which becomes infeasible when a user has tens of millions of followers.
Feed Pull Model : followers pull the latest videos on demand. Given Weibo's massive user base and the need for consistency, we chose the Pull model.
Feed Pull Model Implementation
We employ a distributed cache with sharding and hash‑based partitioning, followed by slice‑level access.
Distributed Cache Architecture
We use a three‑level cache: L1 (hot cache, ~200 MB, LRU eviction), Master (≈4 GB), and Slave (≈6 GB). L1 handles the hottest data and can be horizontally scaled quickly during spikes. Master/Slave provide larger capacity to avoid cold‑data misses.
Cache nodes are deployed across two IDC data centers (IDC‑A and IDC‑B) with master‑slave synchronization, forming an HA multi‑data‑center setup. Synchronization ensures consistent hot‑data metrics across sites.
Cache Technology Choice
We selected a custom MC cache over Redis because MC offers higher throughput for simple key‑value access at massive scale, despite its limitations with highly mutable data.
Elastic Scaling Platform (DCP)
Our self‑developed DCP platform provides both scheduled (peak‑hour) and on‑demand elastic scaling. When internal resources are exhausted, we seamlessly integrate Alibaba Cloud resources for hybrid‑cloud scaling.
Microservice Circuit‑Breaker Mechanism
A circuit‑breaker, similar to an electrical fuse, monitors service load (e.g., 3000 QPS threshold). If a service exceeds the limit, it is temporarily disabled, protecting the rest of the system. After load drops or scaling completes, the service is restored.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
