Designing a Scalable Real‑Time Video Ranking System for Billions of Views
This article walks through the complete system‑design process for a Douyin‑style video leaderboard, covering requirement analysis, functional and non‑functional specs, scale estimation, single‑machine baseline, fault‑tolerance, sharding, time‑window handling with a double‑pointer method, and caching strategies to achieve low latency, high throughput, and precise results at massive scale.
Hello, I'm Su San.
Today we dissect a frequent system‑design interview question: how to design a popular video leaderboard like Douyin?
1. Requirement Analysis
Assume a massive stream of video view data (essentially a continuous flow of VideoID records). At any moment we must accurately count the top K videos by view count within a given time window (e.g., past 1 hour, 1 day, 1 month, or the entire history). For extreme cases, the interview may specify up to 700 billion daily views and more than one hour of new video uploads per second.
1.1 Functional Requirements
Core requirement : The client can query the top K (e.g., up to 1000) videos for a specified time period.
Non‑core requirement (not considered in this design)
1.2 Non‑functional Requirements
Core
These performance requirements are critical for later technical decisions, especially the sub‑10 ms response requirement, which essentially eliminates any real‑time computation approach and forces a pre‑computation strategy.
1.3 Scale Estimation
We estimate two key metrics for design: writes per second (TPS/QPS) and total video count.
First, the throughput:
# 700 billion views per day / (≈100k seconds/day)
70B views/day / (100k seconds/day) = 700k tps70 万 TPS is huge, so we must shard traffic across many machines. Next, storage estimation:
# (1 hour of content uploaded per second / average 6‑minute video) * 100k seconds/day = 1 M new videos per day
Videos/Day = 1 hour content/second / (6 minutes content/video) * (100k seconds/day) = 1M videos/day
# 1 M videos/day * 365 days/year * 10 years ≈ 3.6 B videos
Total Videos = 1M videos/day * 365 days/year * 10 years ≈ 3.6B videosIf we only store each video ID (8 bytes) and its count (8 bytes), the naive storage is:
# 4 B videos * (8 bytes/ID + 8 bytes/count) = 64 GB64 GB can fit in memory, especially with a distributed deployment.
2. Low‑level Design
After clarifying requirements, we can present the low‑level design.
2.1 Core Entity Definition
The core business entities are straightforward:
Video (Video) : The object being ranked.
View (View) : A user’s watch event.
Time Window (Time Window) : The period over which we compute the leaderboard (e.g., 1 hour, 1 day).
2.2 System Interface Design
We need a simple API to fetch the Top K videos for a given window.
// GET request to query Top K videos for a specific window
GET /views/top?window={WINDOW}&k={K}
// Response body
{
"videos": [
{
"videoId": "...", // video ID
"views": "..." // view count
}
// ... more videos
]
}This part is routine and does not require deep discussion in an interview.
3. High‑level Design
Interviewer: "The entities and interfaces are clear. How would you solve this architecturally?"
We start with the simplest single‑machine solution, expose its bottlenecks, then iteratively evolve to a fault‑tolerant, scalable architecture.
3.1 Basic Single‑machine Solution
We maintain two in‑memory structures on a single server:
A huge HashMap where Key = VideoID and Value = view count.
A min‑heap of capacity K (e.g., 1000) to keep the current Top K videos.
Processing flow:
Consume view events from a stream (e.g., Kafka).
Atomically increment the count for the corresponding VideoID in the hash map.
Compare the updated count with the heap’s top element (the smallest count among the current Top K).
If the new count is larger, replace the heap top with the new video and re‑heapify.
Clients read directly from the heap.
This simple version suffers from two fatal problems:
Throughput bottleneck : A single machine cannot handle the estimated 700 k TPS.
Single point of failure : If the host crashes, the service disappears and all in‑memory data is lost.
3.2 Solving Single‑Point‑Failure
3.2.1 Database Counting
Persist the hash map and heap state to a database so the service becomes stateless. On failure, a new instance can reload the state.
However, this merely shifts the bottleneck to the database: every update requires at least one round‑trip, which is unacceptable for 700 k TPS, and atomicity between count updates and heap adjustments is hard to guarantee.
3.2.2 Multi‑Replica Strategy
Run multiple independent replicas, each consuming the full stream and maintaining its own hash map and heap.
Pros: Read scalability and high availability. Cons: Resource waste (each replica processes the full stream) and recovery complexity (a new replica must replay from the beginning).
3.2.3 Snapshot‑Based Replicas
Periodically snapshot each replica’s in‑memory state to object storage. A new instance can load the latest snapshot and resume consumption from the corresponding Kafka offset.
This improves recovery time but still incurs extra storage and snapshot overhead.
3.3 Scaling Write Throughput
Interviewer: "Replicas solve availability, but each still handles 700 k TPS. How do we break the write bottleneck?"
3.3.1 Fixed‑ID Sharding
Create P shards; each shard processes a subset of videos determined by shard_index = hash(video_id) % P. Each shard keeps its own hash map and min‑heap, and the same snapshot‑replica pattern applies.
Clients need an aggregation service that merges the Top K heaps from all shards to produce a global Top K.
Scalability limitation : Adding shards requires massive data migration because the modulo mapping is static. Aggregation overhead : A large number of shards means many RPC calls, which can become a new bottleneck.
3.3.2 Elastic Sharding (Recommended)
Use consistent hashing instead of simple modulo, allowing dynamic addition or removal of shards without massive data movement. New shards read from two neighboring snapshots to initialize their state.
This requires a service‑registry (e.g., ZooKeeper or etcd) to maintain the hash ring and coordinate shard changes.
4. Extensibility Design
4.1 Handling Sliding Time Windows
Interviewer: "We have an all‑time total leaderboard, but we also need sliding windows (last 1 hour, 1 day, 1 month). How would you implement them?"
Sliding windows are tricky because counts must be decremented when data expires. Below are several approaches.
4.1.1 Micro‑Bucket Strategy (Rejected)
Maintain per‑video minute buckets ( Map<[VideoID, MinuteTimestamp], Count>) and sum the last 60 buckets for a 1‑hour window, etc.
Problems: stale heap entries, huge computation cost (e.g., 43 200 minute buckets for a month), and memory explosion.
4.1.2 Refresh‑Heap Method (Complex)
Before each query, recompute the exact count for the K items in the heap and adjust the heap accordingly. Also store multiple granularities (minute, hour, day) to reduce aggregation cost, and periodically clean very old data.
Still overly complex and retains per‑minute data for a month.
4.1.3 Double‑Pointer Method (Recommended)
Interviewer: "Is there a simpler solution?"
Leverage Kafka’s ability to consume from arbitrary offsets. For each window we keep a separate hash map and Top‑K heap, and run two consumer pointers:
Leading edge : Consumes the live stream, incrementing counts for all windows.
Trailing edge : For a 1‑hour window, consumes events that are exactly 1 hour old; for a 1‑day window, consumes events 1 day old, etc., decrementing the corresponding counts.
This guarantees that each window’s hash map always holds the exact count for that window, and the heap never becomes stale.
Drawbacks:
Kafka must retain at least one month of data, increasing storage cost.
Read load quadruples (one leading edge + three trailing edges).
Heap size may need to be larger than K (e.g., 2K) to avoid frequent churn.
4.2 Massive Read Request Handling
To meet the sub‑minute latency requirement, add a cache with a 1‑minute TTL. Queries first hit the cache; if a miss occurs, the service reads the relevant heap, merges counts, and stores the result back in the cache.
5. Summary
We started from a naive single‑machine design, added multi‑replica snapshots for fault tolerance, introduced sharding (first fixed, then elastic) to achieve high write throughput, and finally adopted the double‑pointer method to maintain exact sliding‑window counts while using caching to satisfy low‑latency read requirements. The final solution balances data structures, distributed scaling, stream processing, caching, and fault tolerance to meet the demanding functional and non‑functional goals of a billion‑scale video ranking service.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
