Designing a Billion‑Scale Leaderboard for Honor of Kings: Key Strategies and Pitfalls

This article analyzes how to build a real‑time, billion‑user leaderboard for a game like Honor of Kings, explaining why traditional ORDER BY queries fail, why Redis sorted sets excel, and detailing sharding, hot‑key mitigation, memory optimization, and fault‑tolerant implementation techniques.

Programmer XiaoFu
Programmer XiaoFu
Programmer XiaoFu
Designing a Billion‑Scale Leaderboard for Honor of Kings: Key Strategies and Pitfalls

Why ORDER BY fails at billion‑scale

Using SELECT * FROM user_info ORDER BY step DESC works only for small tables. With billions of rows and high‑frequency real‑time updates, disk I/O cannot keep up with the sorting workload and the system collapses due to insufficient concurrency handling.

Redis sorted set as the core leaderboard engine

Redis zset stores each member together with a numeric score and keeps the set ordered in memory. This yields:

Fast updates ( ZADD) – sub‑millisecond latency.

Instant rank queries ( ZREVRANK) – ~0.2 ms.

Top‑N retrieval ( ZREVRANGE) – ~1 ms for the first 100 entries.

Performance comparison (billion‑scale data):

Update a single user score – Redis 0.1 ms vs MySQL 5~50 ms (index update cost).

Query user rank – Redis 0.2 ms vs MySQL 100 ms~5 s.

Get Top 100 – Redis 1 ms vs MySQL 1 s~10 s (in‑memory sort or temporary file).

High‑concurrency support – Redis >100 k ops/s vs MySQL ~1 k ops/s (requires sharding).

Scalability through sharding

Redis can split data across many instances. Example: 100 million users distributed to 10 shards, each handling 10 million records. Benefits:

Linear scaling – adding machines increases capacity and throughput.

Load distribution – reads and writes are spread across shards, avoiding a single bottleneck.

Independent scaling – hot shards can be upgraded without affecting others.

Handling high concurrency

Redis relies on three mechanisms:

In‑memory reads/writes – ~100 k× faster than disk.

Single‑threaded event loop – eliminates lock contention and context‑switch overhead.

I/O multiplexing – one thread can manage thousands of connections simultaneously.

Tests show a single Redis instance sustains >100 k QPS; a sharded cluster easily exceeds one million concurrent operations.

Challenges at billion‑scale and mitigations

Hot‑key problem

Global Top 100 queries ( ZREVRANGE leaderboard 0 99) concentrate all requests on a single key, potentially saturating CPU and bandwidth on one shard.

Multi‑level cache: JVM local cache → Redis cluster (short TTL for Top 100).

Read‑write separation with slave load balancing: master handles writes, multiple slaves serve Top 100 reads.

Sharded key design: split the leaderboard by score range, e.g. leaderboard:top1 – top 100 users. leaderboard:top2 – ranks 101‑1000. leaderboard:rest – all other users.

Querying Top 100 only accesses leaderboard:top1.

Memory explosion

Storing 100 million users with 32‑byte keys (e.g., user:123) consumes ~3.2 GB just for keys, plus scores and pointers.

Shorten key names: store the integer ID directly (e.g., 123) to use Redis’s integer encoding.

Shard storage: hash user IDs to multiple Redis instances to spread memory pressure.

Persistence risk

Redis crash may lose the latest data even with AOF (default sync every second).

Asynchronous double‑write: on score update, also write to Kafka; a consumer persists the data to MySQL for recovery.

Mixed persistence: enable both RDB snapshots and AOF to balance recovery speed and data integrity.

Divide‑and‑conquer implementation

Goal: retrieve the top 100 players of a competition (classic Top N problem).

1. Split by score interval

Define plates (Redis keys) based on score ranges:

High scores (≥ 2500) → rank:2500_2600 Mid scores (2400‑2500) → rank:2400_2500 Low scores (0‑2400) → rank:0_2400 Fetching Top 100 only scans the “gold” plate, avoiding a full‑set scan.

2. Dynamic routing

When a player’s score changes, the system automatically routes the member to the appropriate plate:

# Pseudo‑code: decide which interval a new score belongs to
if 2500 <= new_score < 2600:
    ZADD rank:2500_2600 new_score member_id
elif 2400 <= new_score < 2500:
    ZADD rank:2400_2500 new_score member_id
# …

3. Aggregation query

Traverse plates from high to low.

Read rank:2500_2600 → obtain up to 50 users.

If fewer than 100, read rank:2400_2500 to fill the remaining slots.

Merge the retrieved users and re‑rank by score (global sorting).

Lessons from prior implementations

Avoid full ZREVRANGE on massive sets – O(N) traversal blocks the Redis thread.

Watch out for hot‑key hotspots – sudden influx of high‑score users can concentrate load on a single shard.

Memory explosion and performance jitter – billions of keys may exceed memory limits; RDB/AOF fork can double memory usage.

Data migration may block services – moving scores across shards without atomic operations can cause loss or duplication.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ShardingredisHigh ConcurrencySorted SetLeaderboardTencent Interview
Programmer XiaoFu
Written by

Programmer XiaoFu

xiaofucode.com – a programmer learning guide driven by the pursuit of profit

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.