Backend Development 19 min read

How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls

This article walks through the end‑to‑end design of a leaderboard that must serve over 100 million users with 100 k queries per second, covering requirement clarification, real‑time and accuracy challenges, technology selection such as Redis ZSet, multi‑layer architecture, sharding, caching, monitoring, and practical implementation tips to achieve low latency, high consistency, and cost‑effective scalability.

NiuNiu MaTe

Oct 29, 2025

How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls

Clarify Requirements

When the user base exceeds 100 million and the system must handle 100 k ranking queries per second while keeping updates within minutes, four core metrics—real‑time, accuracy, pressure resistance, and flexibility—directly determine success and must be quantified.

Key Challenges

1. Real‑time is critical

Core leaderboards (e.g., game power) must update within 5 seconds; otherwise user complaint rates rise by 15 %.

Non‑core lists can tolerate minute‑level updates but must clearly indicate the refresh interval to users.

Even asynchronous updates should feel instantaneous through front‑end effects.

2. Accuracy is the baseline

All user actions must be captured 100 % to ensure no data loss.

Score updates must be atomic to avoid concurrency errors.

After a failure, data must be recoverable and fully traceable.

Cross‑shard data sync requires a reconciliation mechanism to handle brief inconsistencies.

4. Flexibility for Business Iteration

Ranking rules should be configurable via JSON without redeploying, allowing dynamic weight adjustments.

Technology Selection: “Billion‑scale” Toolset

Redis ZSet is the natural choice for real‑time ranking because of O(log N) score updates, built‑in sorting, and rich range queries.

Traditional databases or search engines cannot match this performance at the required scale.

Redis ZSet Features

O(log N) score update – ZINCRBY is atomic.

Built‑in sorting – no extra computation needed.

Rich range queries – ZREVRANGE, ZCOUNT, etc., match leaderboard needs.

Architecture Decomposition: Four Layers

Data Ingestion → Computation & Sorting → Storage → Presentation.

Data Ingestion Layer

Use Kafka clusters with multi‑active deployment across regions to guarantee no data loss and low latency.

Traffic control via Kafka quotas limits per‑producer/consumer rates.

Dead‑letter queues store failed messages for later manual inspection.

Computation & Sorting Layer

Three patterns are applied based on scenario:

Real‑time Lua scripts on Redis for high‑frequency, moderate‑size data.

-- Lua script example
local newScore = redis.call('ZINCRBY','game_power_ranking:server1',50,'user:10086')
local rank = redis.call('ZREVRANK','game_power_ranking:server1','user:10086')
return {newScore, rank}

Batch processing with Spark + ClickHouse for hourly or daily heavy‑weight jobs.

Hybrid Flink (real‑time) + Spark (batch) for combined real‑time + historical scoring.

Storage Layer

Redis Cluster with sharding, multi‑level cache (Caffeine → Redis → ClickHouse/MySQL), and hot‑warm‑cold separation.

Hot data (≤7 days) stored in Redis Cluster, refreshed every 5 minutes.

Warm data in MySQL (≈3 months, <100 ms latency).

Cold data in ClickHouse/Hive (permanent, <1 s latency).

Presentation Layer

CDN‑cached static JSON for top‑20 lists, API gateway for routing and rate limiting, and globally distributed application clusters on Kubernetes with Horizontal Pod Autoscaling.

Key Implementation Details

Cross‑shard top‑100 queries are optimized by pre‑computing top‑1000 per shard and performing hierarchical merging (group‑level then global), reducing latency from >500 ms to ~80 ms while meeting P99 ≤ 200 ms requirements.

Data consistency is ensured through atomic Lua scripts for single‑shard operations, daily reconciliation jobs for cross‑shard totals, and detailed ELK logs that record user ID, action type, score delta, timestamp, and request ID for traceability.

Monitoring covers shard health (QPS, memory, P99 latency), data consistency (real‑time total‑score drift, offline reconciliation), and user‑experience metrics (front‑end load time), with alerts triggered on threshold breaches.

Design Principles Summary

Large keys must be sharded; small keys optimized for storage.

Use Redis + Lua for real‑time, Spark/Flink for batch processing.

Hierarchical merging for efficient cross‑shard queries.

Multi‑tier storage (hot, warm, cold) controls cost.

Final consistency + traceability via atomic ops, periodic reconciliation, and comprehensive logs.

Full‑stack monitoring and disaster‑recovery mechanisms keep the system alive under billion‑scale load.

distributed-systems real-time big data scalability leaderboard

Written by

NiuNiu MaTe

Joined Tencent (nicknamed "Goose Factory") through campus recruitment at a second‑tier university. Career path: Tencent → foreign firm → ByteDance → Tencent. Started as an interviewer at the foreign firm and hopes to help others.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.