Designing a Billion‑Scale Leaderboard for Honor of Kings: Key Strategies and Pitfalls
This article analyzes how to build a real‑time, billion‑user leaderboard for a game like Honor of Kings, explaining why traditional ORDER BY queries fail, why Redis sorted sets excel, and detailing sharding, hot‑key mitigation, memory optimization, and fault‑tolerant implementation techniques.
Why ORDER BY fails at billion‑scale
Using SELECT * FROM user_info ORDER BY step DESC works only for small tables. With billions of rows and high‑frequency real‑time updates, disk I/O cannot keep up with the sorting workload and the system collapses due to insufficient concurrency handling.
Redis sorted set as the core leaderboard engine
Redis zset stores each member together with a numeric score and keeps the set ordered in memory. This yields:
Fast updates ( ZADD) – sub‑millisecond latency.
Instant rank queries ( ZREVRANK) – ~0.2 ms.
Top‑N retrieval ( ZREVRANGE) – ~1 ms for the first 100 entries.
Performance comparison (billion‑scale data):
Update a single user score – Redis 0.1 ms vs MySQL 5~50 ms (index update cost).
Query user rank – Redis 0.2 ms vs MySQL 100 ms~5 s.
Get Top 100 – Redis 1 ms vs MySQL 1 s~10 s (in‑memory sort or temporary file).
High‑concurrency support – Redis >100 k ops/s vs MySQL ~1 k ops/s (requires sharding).
Scalability through sharding
Redis can split data across many instances. Example: 100 million users distributed to 10 shards, each handling 10 million records. Benefits:
Linear scaling – adding machines increases capacity and throughput.
Load distribution – reads and writes are spread across shards, avoiding a single bottleneck.
Independent scaling – hot shards can be upgraded without affecting others.
Handling high concurrency
Redis relies on three mechanisms:
In‑memory reads/writes – ~100 k× faster than disk.
Single‑threaded event loop – eliminates lock contention and context‑switch overhead.
I/O multiplexing – one thread can manage thousands of connections simultaneously.
Tests show a single Redis instance sustains >100 k QPS; a sharded cluster easily exceeds one million concurrent operations.
Challenges at billion‑scale and mitigations
Hot‑key problem
Global Top 100 queries ( ZREVRANGE leaderboard 0 99) concentrate all requests on a single key, potentially saturating CPU and bandwidth on one shard.
Multi‑level cache: JVM local cache → Redis cluster (short TTL for Top 100).
Read‑write separation with slave load balancing: master handles writes, multiple slaves serve Top 100 reads.
Sharded key design: split the leaderboard by score range, e.g. leaderboard:top1 – top 100 users. leaderboard:top2 – ranks 101‑1000. leaderboard:rest – all other users.
Querying Top 100 only accesses leaderboard:top1.
Memory explosion
Storing 100 million users with 32‑byte keys (e.g., user:123) consumes ~3.2 GB just for keys, plus scores and pointers.
Shorten key names: store the integer ID directly (e.g., 123) to use Redis’s integer encoding.
Shard storage: hash user IDs to multiple Redis instances to spread memory pressure.
Persistence risk
Redis crash may lose the latest data even with AOF (default sync every second).
Asynchronous double‑write: on score update, also write to Kafka; a consumer persists the data to MySQL for recovery.
Mixed persistence: enable both RDB snapshots and AOF to balance recovery speed and data integrity.
Divide‑and‑conquer implementation
Goal: retrieve the top 100 players of a competition (classic Top N problem).
1. Split by score interval
Define plates (Redis keys) based on score ranges:
High scores (≥ 2500) → rank:2500_2600 Mid scores (2400‑2500) → rank:2400_2500 Low scores (0‑2400) → rank:0_2400 Fetching Top 100 only scans the “gold” plate, avoiding a full‑set scan.
2. Dynamic routing
When a player’s score changes, the system automatically routes the member to the appropriate plate:
# Pseudo‑code: decide which interval a new score belongs to
if 2500 <= new_score < 2600:
ZADD rank:2500_2600 new_score member_id
elif 2400 <= new_score < 2500:
ZADD rank:2400_2500 new_score member_id
# …3. Aggregation query
Traverse plates from high to low.
Read rank:2500_2600 → obtain up to 50 users.
If fewer than 100, read rank:2400_2500 to fill the remaining slots.
Merge the retrieved users and re‑rank by score (global sorting).
Lessons from prior implementations
Avoid full ZREVRANGE on massive sets – O(N) traversal blocks the Redis thread.
Watch out for hot‑key hotspots – sudden influx of high‑score users can concentrate load on a single shard.
Memory explosion and performance jitter – billions of keys may exceed memory limits; RDB/AOF fork can double memory usage.
Data migration may block services – moving scores across shards without atomic operations can cause loss or duplication.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer XiaoFu
xiaofucode.com – a programmer learning guide driven by the pursuit of profit
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
