Backend Development 12 min read

Design and Solutions for High Availability and High Concurrency in Weibo Short Video Service

The article presents a detailed analysis of Weibo's short‑video platform architecture, covering team background, business scenarios, micro‑service design, feed‑pull model, multi‑level distributed caching, multi‑datacenter HA deployment, circuit‑breaker mechanisms, and elastic scaling to achieve high availability under unpredictable traffic spikes.

Architecture Digest

May 29, 2019

Design and Solutions for High Availability and High Concurrency in Weibo Short Video Service

Team Introduction

Weibo's video platform team, part of the R&D department, is responsible for video‑related services such as video Weibo, "Weibo Stories", short videos, and live streaming, as well as the underlying infrastructure including file storage, transcoding, scheduling, and media libraries.

Business Scenario

The short‑video service must handle sudden traffic surges caused by hot events (e.g., celebrity news) that can generate millions of requests per second, far exceeding simple server‑scale adjustments.

Architecture Design – Micro‑service Structure

The system adopts a micro‑service architecture with a mixed Web API and internal RPC interface layer, a façade that aggregates services, and vertical functional services exposing APIs. Dependent services (e.g., user‑follow) are accessed via RPC, while the storage layer combines cache and database.

Technical Challenges

Traditional feed‑push models cannot cope with unpredictable, massive fan bases; therefore a feed‑pull model is chosen, which pulls content from followed users in real time.

Solution Comparison – Feed Push vs. Feed Pull

Feed‑push leads to massive, inconsistent pushes at scale, while feed‑pull allows on‑demand retrieval, better suited for Weibo's massive user base and consistency requirements.

Distributed Cache Architecture

A three‑level cache (L1, Master, Slave) is employed. L1 is a tiny hot‑data cache (≈200 MB) using LRU for the hottest items; Master and Slave provide larger capacities (4 GB and 6 GB) to prevent cold‑data misses. The cache is sharded via hash and supports rapid horizontal scaling.

HA Multi‑Datacenter Deployment

Master and Slave caches are synchronized across two IDC sites (IDC‑A and IDC‑B) using bidirectional master‑slave replication, ensuring consistent hot‑data metrics and balanced load.

Cache Technology Choice

The team uses a custom MC cache rather than Redis because MC offers higher throughput for simple key‑value pairs, though it is less flexible for highly mutable data.

Elastic Scaling (DCP Platform)

To handle cost constraints, a self‑developed DCP elastic scaling platform combines on‑premise shared machines with Alibaba Cloud resources, providing both automatic burst scaling and scheduled scaling during peak hours.

Micro‑service Circuit‑Breaker Mechanism

A circuit‑breaker monitors service QPS (e.g., 3000 QPS threshold) and automatically isolates overloaded services, allowing the system to recover after scaling or load reduction, thus preventing cascade failures.

Author: Liu Zhiyong (Weibo Video Platform Architect)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Architecture high concurrency distributed cache short video elastic scaling Weibo

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.