Backend Development 21 min read

Designing a Billion‑User Social Graph for 100W QPS on JD

The article presents a complete backend design for JD's social‑relationship scenario handling 100 W QPS, covering data modeling, MySQL sharding, Redis structures, multi‑level caching, big‑key mitigation, write‑read path optimization, pagination, consistency guarantees, and query strategies for both "following" and "followers" lists.

Tech Freedom Circle

Sep 23, 2025

Designing a Billion‑User Social Graph for 100W QPS on JD

Overall Architecture

Target: 100W QPS for hundreds of millions of users. The design combines MySQL horizontal sharding (16 databases × 1024 tables), a Redis cluster (initially 16 shards, later expanded to 32), and a three‑level cache (local Caffeine, Redis, MySQL) to achieve sub‑millisecond latency.

Core Data Model

Relationship semantics

Direction: follow, fan, bidirectional – stored as two redundant rows to avoid joins.

Status: active, cancelled, blocked – soft delete, keep history.

Time: create_time, update_time – supports temporal queries.

MySQL table

CREATE TABLE user_relation_ (
    user_id BIGINT COMMENT '用户A',
    target_id BIGINT COMMENT '用户B',
    relation_type TINYINT COMMENT '1 A→B  2 B→A  3 双向',
    state TINYINT COMMENT '0取消 1正常 2拉黑',
    create_time DATETIME,
    PRIMARY KEY (user_id, target_id),
    KEY idx_target(target_id)
) ENGINE=InnoDB
PARTITION BY HASH(user_id) PARTITIONS 1024;

Redis structures

following:{userId} – Set of targetIds, never expires.

followers:{userId} – Set of fanIds, never expires.

cnt:following:{userId} – String count, never expires, async sync.

cnt:followers:{userId} – String count, same.

Concurrent Write Path (Follow Example)

Atomic update via a Redis Lua script:

String lua = "if redis.call('SISMEMBER', KEYS[1], ARGV[1]) == 1 then " +
             "return 0 " +
             "else " +
             "redis.call('SADD', KEYS[1], ARGV[1]); " +
             "redis.call('SADD', KEYS[2], ARGV[2]); " +
             "redis.call('INCR', KEYS[3]); " +
             "redis.call('INCR', KEYS[4]); " +
             "return 1 " +
             "end";
Long ok = redis.execute(lua,
    Arrays.asList("following:"+userId, "followers:"+targetId,
                  "cnt:following:"+userId, "cnt:followers:"+targetId),
    targetId.toString(), userId.toString());
if (ok == 1) {
    rocketMQ.syncSend("follow-topic", new FollowEvent(userId, targetId, 1));
}

Concurrent Read Path

Cache‑penetration protection

// Bloom filter pre‑load all userIds (~1.2 GB for 10 B bits)
if (!bloomFilter.mightContain(targetId)) {
    return false; // illegal user
}

Three‑level cache

L1: Local Caffeine, hit≈70 %, 5 s TTL for hot V users.

L2: Redis, hit≈99 %.

L3: MySQL, 100 % fallback via binlog replay.

Capacity & Sharding Strategy

Redis

16‑shard cluster, each shard 64 GB → total 1 TB.

When a V‑user exceeds 10 k fans, split into followers:{userId}:{shardId} where shardId = targetId % 32.

MySQL

16 databases × 1024 tables = 16 384 tables.

Each table < 50 M rows, index depth ≤ 3.

Data older than 6 months migrated to TiDB/ODPS.

BigKey Risk & Mitigation

Detection

// Sample every 10 min
if (redis.llen("followers:" + userId) > 10000) { alertBigKey(userId); }

Mitigation steps

Hash‑based horizontal splitting into 32 buckets.

Read‑write routing per shard.

Asynchronous migration of hot keys.

Secondary dispersion to 256 micro‑buckets when needed.

Effect validation

Key size: 1 key 10 k elements → 32 keys ≈ 310 elements each.

Single‑call block: 50 ms → < 1 ms.

Failure rate: 5 % → 0.1 %.

High‑Concurrency Attack Plan (100 W QPS)

Read‑Write Separation

Write path: Redis master → RocketMQ (async) → MySQL master.

Read path: Local cache → Redis slave → MySQL slave (fallback).

Redis Cluster Scaling

Expand shards from 16 to 32, reducing per‑shard QPS from 6.25 W to 3.125 W.

Isolate hot V users into 8 dedicated shards.

Configure each master with 2 slaves for read off‑loading.

Batch Request Optimization

Batch follow‑status checks (≤20 targets) via pipelined SISMEMBER, and batch MySQL writes (≥100 events) in a consumer.

public List<Boolean> batchCheckFollow(Long userId, List<Long> targetIds) {
    List<Object> results = redis.pipelined(p -> {
        targetIds.forEach(t -> p.sIsMember("following:" + userId, t.toString()));
    }).get();
    return results.stream()
        .map(res -> res != null && (Boolean) res)
        .collect(Collectors.toList());
}

@RabbitListener(queues = "follow-event-queue")
public void batchProcessFollowEvent(List<FollowEvent> events) {
    Map<Long, List<FollowEvent>> groups = events.stream()
        .collect(Collectors.groupingBy(FollowEvent::getUserId));
    groups.forEach((uid, list) -> {
        List<FollowDO> dos = list.stream()
            .map(e -> new FollowDO(e.getUserId(), e.getTargetId(), e.getRelationType()))
            .collect(Collectors.toList());
        if (dos.size() >= 100) {
            userRelationMapper.batchInsert(dos);
            dos.clear();
        }
    });
}

Multi‑Level Cache Upgrade (support 50 W QPS)

L1: Caffeine (consistent‑hash) – hot V follow status/list, TTL 30 s, target hit 40 %.

L2: Redis slave – normal user data, permanent, target hit 29.5 %.

L3: MySQL slave – degraded‑scenario data, target hit 0.5 %.

Write‑Side Throttling & Asynchrony

Front‑end updates Redis atomically then pushes a RocketMQ message; the consumer batches 100 events or 100 ms intervals before writing to MySQL.

public boolean asyncFollow(Long userId, Long targetId) {
    String lua = "if redis.call('SISMEMBER', KEYS[1], ARGV[1]) == 1 then return 0 else " +
                 "redis.call('SADD', KEYS[1], ARGV[1]); " +
                 "redis.call('SADD', KEYS[2], ARGV[2]); " +
                 "redis.call('INCR', KEYS[3]); " +
                 "redis.call('INCR', KEYS[4]); " +
                 "return 1 end";
    Long result = redis.execute(lua,
        Arrays.asList("following:"+userId, "followers:"+targetId,
                      "cnt:following:"+userId, "cnt:followers:"+targetId),
        targetId.toString(), userId.toString());
    if (result == 1) {
        FollowEvent ev = new FollowEvent(userId, targetId, 1);
        rocketMQ.send("follow-topic", MessageSelector.byTag("userId:"+(userId%100)), ev);
        return true;
    }
    return false;
}

Querying "My Follow" and "Followers"

Core model recap

following:{userId} – Set of targetIds.

followers:{userId} – Set of fanIds.

Query design

My Follow List: read following:{userId} from Redis, fallback to MySQL.

Followers List: read followers:{userId} (or its sharded keys) and merge.

Pagination & Cursor

// Cursor fields
lastId = last returned max userId
size   = page size (≤50)

Convert the Redis Set to a sorted ZSet by follow timestamp; the timestamp serves as a cursor supporting forward and backward paging.

Consistency & Degradation

Cache miss → async back‑fill from MySQL → write back to Redis for next hit.

Incremental binlog sync keeps Redis within <1 s of MySQL.

Followers of Big V Users (Post‑Sharding)

After splitting a big V's follower set into 32 shards, a query must broadcast SMEMBERS to all shards, merge results, and paginate.

Sharded key space

followers:{userId}:0   // fanId % 32 = 0
followers:{userId}:1   // fanId % 32 = 1
... 
followers:{userId}:31

Parallel retrieval

// Use CompletableFuture to issue 32 SMEMBERS concurrently; total latency ≈ single RTT (<1 ms).

Merge & Pagination

// Stream‑merge the 32 sets, sort by fanId (or timestamp), then slice by cursor.

Hole filling

If a shard returns fewer elements than the page size, fetch the missing rows from MySQL, write them back to the corresponding Redis shard, and include them in the current response. Subsequent pages will no longer have the hole.

Summary

The design combines redundant MySQL records, hash‑based Redis sharding, three‑level caching, Bloom‑filter protection, and asynchronous write pipelines to achieve sub‑millisecond latency for follower queries and sustain 100 W QPS. BigKey risks are eliminated through hash splitting, micro‑bucket dispersion, and proactive hole‑filling, ensuring stable performance even for billion‑scale social graphs.

Written by

Tech Freedom Circle

Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Overall Architecture

Core Data Model

Relationship semantics

MySQL table

Redis structures

Concurrent Write Path (Follow Example)

Concurrent Read Path

Cache‑penetration protection

Three‑level cache

Capacity & Sharding Strategy

Redis

MySQL

BigKey Risk & Mitigation

Detection

Mitigation steps

Effect validation

High‑Concurrency Attack Plan (100 W QPS)

Read‑Write Separation

Redis Cluster Scaling

Batch Request Optimization

Multi‑Level Cache Upgrade (support 50 W QPS)

Write‑Side Throttling & Asynchrony

Querying "My Follow" and "Followers"

Core model recap

Query design

Pagination & Cursor

Consistency & Degradation

Followers of Big V Users (Post‑Sharding)

Sharded key space

Parallel retrieval

Merge & Pagination

Hole filling

Summary

Tech Freedom Circle

How this landed with the community

Was this worth your time?

0 Comments

High‑Concurrency Attack Plan (100 W QPS)

Multi‑Level Cache Upgrade (support 50 W QPS)