Designing a Scalable Real‑Time Video Ranking System for Billions of Views
This article walks through the end‑to‑end system design of a TikTok‑style video leaderboard, covering functional and non‑functional requirements, traffic estimation, single‑node prototypes, fault‑tolerant replication, sharding, elastic partitioning, sliding‑window counters, and caching strategies to achieve low‑latency, high‑throughput ranking at massive scale.
1. Requirement Analysis
We need a service that can return the top K videos (e.g., up to 1000) for fixed time windows such as the last hour, day, month, and the entire history, with sub‑minute latency, exact counts, high write throughput, support for billions of videos, and cost‑effective operation.
1.1 Functional Requirements
Clients can query the top K videos for a given time window.
Supported windows are fixed: 1 hour, 1 day, 1 month, and total.
1.2 Non‑Functional Requirements
Data delay must be ≤ 1 minute.
Results must be exact, no approximations.
System must handle massive view events (high throughput).
Support a huge total video count.
Query latency should be in tens of milliseconds (10‑100 ms).
Cost must be controllable; cannot rely on unlimited hardware.
1.3 Scale Estimation
Assuming 700 billion daily views (≈ 700 k TPS) and about 3.6 billion videos, a naive storage of 8 bytes for ID and 8 bytes for count totals ~64 GB, which can fit in memory when distributed.
# 700B views/day / (≈100k seconds/day)
70B views/day / 100k seconds/day = 700k TPS # 4B videos * (8 bytes ID + 8 bytes count) = 64 GB2. Underlying Design
Core entities: Video, View (watch event), Time Window.
2.2 API Design
// GET request to query Top K videos for a window
GET /views/top?window={WINDOW}&k={K}
// Response
{
"videos": [
{"videoId": "...", "views": "..."},
...
]
}3. Upper‑Level Design
3.1 Basic Single‑Machine Solution
Maintain a large HashMap (key = VideoID, value = view count) and a min‑heap of size K to keep the current top K videos.
Consume view events from a stream (e.g., Kafka).
Atomically increment the count in the hash map.
Compare the updated count with the heap top; if larger, replace the top and re‑heapify.
Clients read directly from the heap.
3.2 Solving Single‑Point Failure
3.2.1 Database Persistence
Persist the hash map and heap state to a database so that a new instance can reload state after a crash. This introduces latency and concurrency bottlenecks at the required 700 k TPS.
3.2.2 Multi‑Replica Strategy
Run multiple identical replicas, each consuming the full stream. Reads can be load‑balanced, providing high availability, but each replica processes the full traffic, increasing cost.
3.2.3 Replica + Snapshot
Periodically snapshot each replica’s in‑memory state to object storage. New instances load the latest snapshot and resume consumption from that offset, reducing recovery time.
3.3 Scaling Write Throughput
3.3.1 Fixed Sharding by ID
Split the workload into P shards using shard_index = hash(video_id) % P. Each shard maintains its own hash map and min‑heap. An aggregation service merges the per‑shard top K lists to produce the global result.
3.3.2 Elastic Partitioning (Recommended)
Use consistent hashing so that shards can be added or removed without massive data migration. A service registry (e.g., ZooKeeper or etcd) tracks shard assignments for both consumers and the aggregation service.
4. Extensibility Design
4.1 Sliding Time Windows
4.1.1 Micro‑Bucket Strategy (Problematic)
Maintain per‑video minute buckets ( Map<[VideoID, MinuteTimestamp], Count>) and sum the last N buckets for each window. This leads to stale heap entries, huge computation, and memory explosion.
4.1.2 Refreshing Heap Data (Inefficient)
Before each query, recompute counts for the K heap entries and adjust the heap, which degrades read performance.
4.1.3 Double‑Pointer Method (Recommended)
Maintain two consumers per window: a leading edge that increments counts for incoming events, and a trailing edge that decrements counts for events exiting the window (e.g., 1 hour ago). This keeps hash maps and heaps exact without stale data.
4.2 Massive Read Request Handling
Introduce a cache with a 1‑minute TTL so that most read requests are served from memory, meeting the sub‑minute freshness requirement.
5. Summary
We progressed from a naive single‑node prototype to a fault‑tolerant, horizontally scalable architecture that uses sharding, consistent hashing, snapshot recovery, double‑pointer sliding windows, and caching to satisfy the demanding functional and non‑functional requirements of a real‑time video ranking service at billions‑scale traffic.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
