Backend Development 11 min read

Designing a Billion‑User Real‑Time Leaderboard: Redis vs MySQL

This article explores how to build a scalable, high‑performance leaderboard for hundreds of millions of users by comparing traditional database ORDER BY approaches with Redis sorted sets, addressing challenges such as hot keys, memory pressure, persistence risks, and presenting a divide‑and‑conquer implementation strategy.

macrozheng
macrozheng
macrozheng
Designing a Billion‑User Real‑Time Leaderboard: Redis vs MySQL

Introduction

Interviewers often ask how to design a billion‑user leaderboard (e.g., for a game like King of Glory ). This guide discusses the shortcomings of using a relational database

ORDER BY

for such scale and explains why Redis is the preferred solution.

1. Why ORDER BY Fails at Scale

For small tables, a simple query like:

<code>select * from user_info order by step desc</code>

works fine. However, with hundreds of millions of rows and high‑frequency updates, the disk‑bound sorting and concurrency limits cause the system to collapse.

In a real‑time, high‑concurrency scenario with billions of users, the disk cannot keep up; sorting becomes impossible and concurrent writes overwhelm the database.

2. Why Redis Is the "Champion" for Leaderboards

Redis

zset

(sorted set) stores each member with a

score

and maintains the order in memory, making it ideal for fast, real‑time ranking.

Redis sorted set illustration
Redis sorted set illustration

Redis offers three key advantages:

All data resides in memory, eliminating disk I/O.

Simple API for adding scores (

ZADD

) and retrieving ranks (

ZREVRANK

,

ZREVRANGE

).

High throughput: single‑threaded, lock‑free operations with I/O multiplexing.

2.1 Sorting Speed

Typical operation latencies:

Update score (

ZADD

) – sub‑millisecond.

Read rank (

ZREVRANK

) – sub‑millisecond.

Fetch Top 100 (

ZREVRANGE

) – ~1 ms.

By contrast, MySQL

ORDER BY

on the same data can take 5 ms to several seconds for each operation.

2.2 Strong Scalability

Redis can shard data across multiple instances, turning a 1 billion‑user set into ten 100 million‑user shards, achieving linear capacity growth and isolated hot‑spot handling.

Imagine splitting a warehouse’s cargo into ten trucks; each truck moves only a fraction of the load, dramatically speeding up loading and unloading.

2.3 Handling High Concurrency

Redis combines three techniques:

In‑memory reads/writes (10⁵‑times faster than disk).

Single‑threaded execution (no lock contention).

I/O multiplexing (one thread serves thousands of connections).

Tests show a single Redis node can sustain >100 k QPS, and a clustered setup easily reaches millions of QPS.

3. Potential Issues with Redis at Billion‑User Scale and Solutions

3.1 Hot‑Key Problem

Frequent queries like

ZREVRANGE leaderboard 0 99

concentrate load on a single key, risking CPU and bandwidth saturation.

All players requesting the global Top 100 hit the same key, which can crash the shard.

Mitigations:

Multi‑level caching (local JVM cache → Redis cluster → persistent store).

Read‑write splitting with replica reads.

Key sharding: split the leaderboard into multiple keys (e.g.,

leaderboard:top1

,

leaderboard:top2

,

leaderboard:rest

).

3.2 Memory Explosion

Storing 100 million keys at 32 bytes each already consumes ~3.2 GB, not counting scores and pointers.

Optimizations:

Shorten key names (store numeric IDs instead of

user:123

).

Shard data across multiple Redis instances.

3.3 Persistence Risks

Redis crashes can lose recent updates even with AOF (default sync every second).

Disaster‑recovery strategies:

Asynchronous dual‑write to Kafka, then persist to MySQL.

Hybrid persistence (both RDB snapshots and AOF) for balanced recovery speed and data safety.

4. Divide‑and‑Conquer Implementation

To fetch the top 100 players efficiently, split the ranking space into intervals ("plates") and query only the relevant plates.

4.1 Interval Splitting

Example key naming:

rank:2500_2600

– high‑score players.

rank:2400_2500

– mid‑score players.

rank:0_2400

– low‑score players.

When requesting Top 100, only the high‑score plate needs to be scanned.

4.2 Dynamic Routing

When a player's score changes, the system automatically determines the correct plate:

<code># Pseudocode: determine plate for a new score
if 2500 <= new_score < 2600:
    push to rank:2500_2600
elif 2400 <= new_score < 2500:
    push to rank:2400_2500
else:
    push to rank:0_2400</code>

4.3 Aggregated Query

Steps to assemble the final Top 100:

Query the high‑score plate → get first 50.

If needed, query the mid‑score plate → fill remaining slots.

Merge results and re‑sort globally.

5. Lessons Learned from Real‑World Deployments

Avoid full‑range

ZREVRANGE

on massive sets; it incurs O(N) traversal.

Watch out for "black‑horse" users that suddenly jump to top scores and create hot shards.

Memory usage can double during RDB/AOF persistence due to forked processes.

Data migration across shards must be atomic to prevent loss or duplication.

scalabilityRedisRankingHigh Concurrencyleaderboardbig-data
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.