How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls
This article walks through the end‑to‑end design of a leaderboard that must serve over 100 million users with 100 k queries per second, covering requirement clarification, real‑time and accuracy challenges, technology selection such as Redis ZSet, multi‑layer architecture, sharding, caching, monitoring, and practical implementation tips to achieve low latency, high consistency, and cost‑effective scalability.
Clarify Requirements
When the user base exceeds 100 million and the system must handle 100 k ranking queries per second while keeping updates within minutes, four core metrics—real‑time, accuracy, pressure resistance, and flexibility—directly determine success and must be quantified.
Key Challenges
1. Real‑time is critical
Core leaderboards (e.g., game power) must update within 5 seconds; otherwise user complaint rates rise by 15 %.
Non‑core lists can tolerate minute‑level updates but must clearly indicate the refresh interval to users.
Even asynchronous updates should feel instantaneous through front‑end effects.
2. Accuracy is the baseline
All user actions must be captured 100 % to ensure no data loss.
Score updates must be atomic to avoid concurrency errors.
After a failure, data must be recoverable and fully traceable.
Cross‑shard data sync requires a reconciliation mechanism to handle brief inconsistencies.
4. Flexibility for Business Iteration
Ranking rules should be configurable via JSON without redeploying, allowing dynamic weight adjustments.
Technology Selection: “Billion‑scale” Toolset
Redis ZSet is the natural choice for real‑time ranking because of O(log N) score updates, built‑in sorting, and rich range queries.
Traditional databases or search engines cannot match this performance at the required scale.
Redis ZSet Features
O(log N) score update – ZINCRBY is atomic.
Built‑in sorting – no extra computation needed.
Rich range queries – ZREVRANGE, ZCOUNT, etc., match leaderboard needs.
Architecture Decomposition: Four Layers
Data Ingestion → Computation & Sorting → Storage → Presentation.
Data Ingestion Layer
Use Kafka clusters with multi‑active deployment across regions to guarantee no data loss and low latency.
Traffic control via Kafka quotas limits per‑producer/consumer rates.
Dead‑letter queues store failed messages for later manual inspection.
Computation & Sorting Layer
Three patterns are applied based on scenario:
Real‑time Lua scripts on Redis for high‑frequency, moderate‑size data.
-- Lua script example
local newScore = redis.call('ZINCRBY','game_power_ranking:server1',50,'user:10086')
local rank = redis.call('ZREVRANK','game_power_ranking:server1','user:10086')
return {newScore, rank}Batch processing with Spark + ClickHouse for hourly or daily heavy‑weight jobs.
Hybrid Flink (real‑time) + Spark (batch) for combined real‑time + historical scoring.
Storage Layer
Redis Cluster with sharding, multi‑level cache (Caffeine → Redis → ClickHouse/MySQL), and hot‑warm‑cold separation.
Hot data (≤7 days) stored in Redis Cluster, refreshed every 5 minutes.
Warm data in MySQL (≈3 months, <100 ms latency).
Cold data in ClickHouse/Hive (permanent, <1 s latency).
Presentation Layer
CDN‑cached static JSON for top‑20 lists, API gateway for routing and rate limiting, and globally distributed application clusters on Kubernetes with Horizontal Pod Autoscaling.
Key Implementation Details
Cross‑shard top‑100 queries are optimized by pre‑computing top‑1000 per shard and performing hierarchical merging (group‑level then global), reducing latency from >500 ms to ~80 ms while meeting P99 ≤ 200 ms requirements.
Data consistency is ensured through atomic Lua scripts for single‑shard operations, daily reconciliation jobs for cross‑shard totals, and detailed ELK logs that record user ID, action type, score delta, timestamp, and request ID for traceability.
Monitoring covers shard health (QPS, memory, P99 latency), data consistency (real‑time total‑score drift, offline reconciliation), and user‑experience metrics (front‑end load time), with alerts triggered on threshold breaches.
Design Principles Summary
Large keys must be sharded; small keys optimized for storage.
Use Redis + Lua for real‑time, Spark/Flink for batch processing.
Hierarchical merging for efficient cross‑shard queries.
Multi‑tier storage (hot, warm, cold) controls cost.
Final consistency + traceability via atomic ops, periodic reconciliation, and comprehensive logs.
Full‑stack monitoring and disaster‑recovery mechanisms keep the system alive under billion‑scale load.
NiuNiu MaTe
Joined Tencent (nicknamed "Goose Factory") through campus recruitment at a second‑tier university. Career path: Tencent → foreign firm → ByteDance → Tencent. Started as an interviewer at the foreign firm and hopes to help others.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
