Designing a Scalable Real‑Time Video Ranking System for Billions of Views

This article walks through the end‑to‑end system design of a TikTok‑style video leaderboard, covering functional and non‑functional requirements, traffic estimation, single‑node prototypes, fault‑tolerant replication, sharding, elastic partitioning, sliding‑window counters, and caching strategies to achieve low‑latency, high‑throughput ranking at massive scale.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
Designing a Scalable Real‑Time Video Ranking System for Billions of Views

1. Requirement Analysis

We need a service that can return the top K videos (e.g., up to 1000) for fixed time windows such as the last hour, day, month, and the entire history, with sub‑minute latency, exact counts, high write throughput, support for billions of videos, and cost‑effective operation.

1.1 Functional Requirements

Clients can query the top K videos for a given time window.

Supported windows are fixed: 1 hour, 1 day, 1 month, and total.

1.2 Non‑Functional Requirements

Data delay must be ≤ 1 minute.

Results must be exact, no approximations.

System must handle massive view events (high throughput).

Support a huge total video count.

Query latency should be in tens of milliseconds (10‑100 ms).

Cost must be controllable; cannot rely on unlimited hardware.

image
image

1.3 Scale Estimation

Assuming 700 billion daily views (≈ 700 k TPS) and about 3.6 billion videos, a naive storage of 8 bytes for ID and 8 bytes for count totals ~64 GB, which can fit in memory when distributed.

# 700B views/day / (≈100k seconds/day)
70B views/day / 100k seconds/day = 700k TPS
# 4B videos * (8 bytes ID + 8 bytes count) = 64 GB

2. Underlying Design

Core entities: Video, View (watch event), Time Window.

2.2 API Design

// GET request to query Top K videos for a window
GET /views/top?window={WINDOW}&k={K}

// Response
{
  "videos": [
    {"videoId": "...", "views": "..."},
    ...
  ]
}

3. Upper‑Level Design

3.1 Basic Single‑Machine Solution

Maintain a large HashMap (key = VideoID, value = view count) and a min‑heap of size K to keep the current top K videos.

Consume view events from a stream (e.g., Kafka).

Atomically increment the count in the hash map.

Compare the updated count with the heap top; if larger, replace the top and re‑heapify.

Clients read directly from the heap.

image-1
image-1

3.2 Solving Single‑Point Failure

3.2.1 Database Persistence

Persist the hash map and heap state to a database so that a new instance can reload state after a crash. This introduces latency and concurrency bottlenecks at the required 700 k TPS.

3.2.2 Multi‑Replica Strategy

Run multiple identical replicas, each consuming the full stream. Reads can be load‑balanced, providing high availability, but each replica processes the full traffic, increasing cost.

image-2
image-2

3.2.3 Replica + Snapshot

Periodically snapshot each replica’s in‑memory state to object storage. New instances load the latest snapshot and resume consumption from that offset, reducing recovery time.

image-3
image-3

3.3 Scaling Write Throughput

3.3.1 Fixed Sharding by ID

Split the workload into P shards using shard_index = hash(video_id) % P. Each shard maintains its own hash map and min‑heap. An aggregation service merges the per‑shard top K lists to produce the global result.

image-5
image-5

3.3.2 Elastic Partitioning (Recommended)

Use consistent hashing so that shards can be added or removed without massive data migration. A service registry (e.g., ZooKeeper or etcd) tracks shard assignments for both consumers and the aggregation service.

image-6
image-6

4. Extensibility Design

4.1 Sliding Time Windows

4.1.1 Micro‑Bucket Strategy (Problematic)

Maintain per‑video minute buckets ( Map<[VideoID, MinuteTimestamp], Count>) and sum the last N buckets for each window. This leads to stale heap entries, huge computation, and memory explosion.

image-7
image-7

4.1.2 Refreshing Heap Data (Inefficient)

Before each query, recompute counts for the K heap entries and adjust the heap, which degrades read performance.

4.1.3 Double‑Pointer Method (Recommended)

Maintain two consumers per window: a leading edge that increments counts for incoming events, and a trailing edge that decrements counts for events exiting the window (e.g., 1 hour ago). This keeps hash maps and heaps exact without stale data.

image-9
image-9

4.2 Massive Read Request Handling

Introduce a cache with a 1‑minute TTL so that most read requests are served from memory, meeting the sub‑minute freshness requirement.

image-10
image-10

5. Summary

We progressed from a naive single‑node prototype to a fault‑tolerant, horizontally scalable architecture that uses sharding, consistent hashing, snapshot recovery, double‑pointer sliding windows, and caching to satisfy the demanding functional and non‑functional requirements of a real‑time video ranking service at billions‑scale traffic.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed architectureshardingSystem Designcachingreal-time rankingSliding Window
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.