Backend Development 26 min read

Designing a Scalable Real‑Time Video Ranking System for Billions of Views

This article walks through the complete system‑design process for a Douyin‑style video leaderboard, covering requirement analysis, functional and non‑functional specs, scale estimation, single‑machine baseline, fault‑tolerance, sharding, time‑window handling with a double‑pointer method, and caching strategies to achieve low latency, high throughput, and precise results at massive scale.

Su San Talks Tech

Nov 9, 2025

Designing a Scalable Real‑Time Video Ranking System for Billions of Views

Hello, I'm Su San.

Today we dissect a frequent system‑design interview question: how to design a popular video leaderboard like Douyin?

1. Requirement Analysis

Assume a massive stream of video view data (essentially a continuous flow of VideoID records). At any moment we must accurately count the top K videos by view count within a given time window (e.g., past 1 hour, 1 day, 1 month, or the entire history). For extreme cases, the interview may specify up to 700 billion daily views and more than one hour of new video uploads per second.

1.1 Functional Requirements

Core requirement : The client can query the top K (e.g., up to 1000) videos for a specified time period.

Non‑core requirement (not considered in this design)

1.2 Non‑functional Requirements

Core

These performance requirements are critical for later technical decisions, especially the sub‑10 ms response requirement, which essentially eliminates any real‑time computation approach and forces a pre‑computation strategy.

1.3 Scale Estimation

We estimate two key metrics for design: writes per second (TPS/QPS) and total video count.

First, the throughput:

# 700 billion views per day / (≈100k seconds/day)
70B views/day / (100k seconds/day) = 700k tps

70 万 TPS is huge, so we must shard traffic across many machines. Next, storage estimation:

# (1 hour of content uploaded per second / average 6‑minute video) * 100k seconds/day = 1 M new videos per day
Videos/Day = 1 hour content/second / (6 minutes content/video) * (100k seconds/day) = 1M videos/day
# 1 M videos/day * 365 days/year * 10 years ≈ 3.6 B videos
Total Videos = 1M videos/day * 365 days/year * 10 years ≈ 3.6B videos

If we only store each video ID (8 bytes) and its count (8 bytes), the naive storage is:

# 4 B videos * (8 bytes/ID + 8 bytes/count) = 64 GB

64 GB can fit in memory, especially with a distributed deployment.

2. Low‑level Design

After clarifying requirements, we can present the low‑level design.

2.1 Core Entity Definition

The core business entities are straightforward:

Video (Video) : The object being ranked.

View (View) : A user’s watch event.

Time Window (Time Window) : The period over which we compute the leaderboard (e.g., 1 hour, 1 day).

2.2 System Interface Design

We need a simple API to fetch the Top K videos for a given window.

// GET request to query Top K videos for a specific window
GET /views/top?window={WINDOW}&k={K}

// Response body
{
  "videos": [
    {
      "videoId": "...", // video ID
      "views": "..."   // view count
    }
    // ... more videos
  ]
}

This part is routine and does not require deep discussion in an interview.

3. High‑level Design

Interviewer: "The entities and interfaces are clear. How would you solve this architecturally?"

We start with the simplest single‑machine solution, expose its bottlenecks, then iteratively evolve to a fault‑tolerant, scalable architecture.

3.1 Basic Single‑machine Solution

We maintain two in‑memory structures on a single server:

A huge HashMap where Key = VideoID and Value = view count.

A min‑heap of capacity K (e.g., 1000) to keep the current Top K videos.

Processing flow:

Consume view events from a stream (e.g., Kafka).

Atomically increment the count for the corresponding VideoID in the hash map.

Compare the updated count with the heap’s top element (the smallest count among the current Top K).

If the new count is larger, replace the heap top with the new video and re‑heapify.

Clients read directly from the heap.

This simple version suffers from two fatal problems:

Throughput bottleneck : A single machine cannot handle the estimated 700 k TPS.

Single point of failure : If the host crashes, the service disappears and all in‑memory data is lost.

3.2 Solving Single‑Point‑Failure

3.2.1 Database Counting

Persist the hash map and heap state to a database so the service becomes stateless. On failure, a new instance can reload the state.

However, this merely shifts the bottleneck to the database: every update requires at least one round‑trip, which is unacceptable for 700 k TPS, and atomicity between count updates and heap adjustments is hard to guarantee.

3.2.2 Multi‑Replica Strategy

Run multiple independent replicas, each consuming the full stream and maintaining its own hash map and heap.

Pros: Read scalability and high availability. Cons: Resource waste (each replica processes the full stream) and recovery complexity (a new replica must replay from the beginning).

3.2.3 Snapshot‑Based Replicas

Periodically snapshot each replica’s in‑memory state to object storage. A new instance can load the latest snapshot and resume consumption from the corresponding Kafka offset.

This improves recovery time but still incurs extra storage and snapshot overhead.

3.3 Scaling Write Throughput

Interviewer: "Replicas solve availability, but each still handles 700 k TPS. How do we break the write bottleneck?"

3.3.1 Fixed‑ID Sharding

Create P shards; each shard processes a subset of videos determined by shard_index = hash(video_id) % P. Each shard keeps its own hash map and min‑heap, and the same snapshot‑replica pattern applies.

Clients need an aggregation service that merges the Top K heaps from all shards to produce a global Top K.

Scalability limitation : Adding shards requires massive data migration because the modulo mapping is static. Aggregation overhead : A large number of shards means many RPC calls, which can become a new bottleneck.

3.3.2 Elastic Sharding (Recommended)

Use consistent hashing instead of simple modulo, allowing dynamic addition or removal of shards without massive data movement. New shards read from two neighboring snapshots to initialize their state.

This requires a service‑registry (e.g., ZooKeeper or etcd) to maintain the hash ring and coordinate shard changes.

4. Extensibility Design

4.1 Handling Sliding Time Windows

Interviewer: "We have an all‑time total leaderboard, but we also need sliding windows (last 1 hour, 1 day, 1 month). How would you implement them?"

Sliding windows are tricky because counts must be decremented when data expires. Below are several approaches.

4.1.1 Micro‑Bucket Strategy (Rejected)

Maintain per‑video minute buckets ( Map<[VideoID, MinuteTimestamp], Count>) and sum the last 60 buckets for a 1‑hour window, etc.

Problems: stale heap entries, huge computation cost (e.g., 43 200 minute buckets for a month), and memory explosion.

4.1.2 Refresh‑Heap Method (Complex)

Before each query, recompute the exact count for the K items in the heap and adjust the heap accordingly. Also store multiple granularities (minute, hour, day) to reduce aggregation cost, and periodically clean very old data.

Still overly complex and retains per‑minute data for a month.

4.1.3 Double‑Pointer Method (Recommended)

Interviewer: "Is there a simpler solution?"

Leverage Kafka’s ability to consume from arbitrary offsets. For each window we keep a separate hash map and Top‑K heap, and run two consumer pointers:

Leading edge : Consumes the live stream, incrementing counts for all windows.

Trailing edge : For a 1‑hour window, consumes events that are exactly 1 hour old; for a 1‑day window, consumes events 1 day old, etc., decrementing the corresponding counts.

This guarantees that each window’s hash map always holds the exact count for that window, and the heap never becomes stale.

Drawbacks:

Kafka must retain at least one month of data, increasing storage cost.

Read load quadruples (one leading edge + three trailing edges).

Heap size may need to be larger than K (e.g., 2K) to avoid frequent churn.

4.2 Massive Read Request Handling

To meet the sub‑minute latency requirement, add a cache with a 1‑minute TTL. Queries first hit the cache; if a miss occurs, the service reads the relevant heap, merges counts, and stores the result back in the cache.

5. Summary

We started from a naive single‑machine design, added multi‑replica snapshots for fault tolerance, introduced sharding (first fixed, then elastic) to achieve high write throughput, and finally adopted the double‑pointer method to maintain exact sliding‑window counts while using caching to satisfy low‑latency read requirements. The final solution balances data structures, distributed scaling, stream processing, caching, and fault tolerance to meet the demanding functional and non‑functional goals of a billion‑scale video ranking service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed architecture System Design real-time ranking Sliding Window high-throughput

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.