Databases 10 min read

How to Compute Mutual Friends for 100M Users in Seconds with Redis

This article breaks down the massive‑scale challenge of calculating mutual friends for hundreds of millions of users, analyzes data‑size, latency, and memory constraints, and presents three Redis‑based solutions—basic Set, bitmap, and a sharded architecture—complete with code snippets, performance numbers, and practical trade‑offs.

ITPUB
ITPUB
ITPUB
How to Compute Mutual Friends for 100M Users in Seconds with Redis

Problem Analysis

Social platforms with 100 million users and an average of 500 friends per user generate ~500 billion friendship edges. A naïve relational join would take >1 hour, far exceeding the millisecond‑level latency expected for “mutual friends” queries.

Core Challenges

Data explosion : 500 billion edges require terabytes of memory.

Real‑time latency : Users expect sub‑second responses.

Memory cost : Simple in‑memory structures consume TB‑scale RAM.

Solution 1 – Set (Bronze)

Design

Store each user's friend list in a Redis Set with key user:{id}:friends.

Use the native SINTER command to compute intersections.

# User 1001's friends: 2001 2002 2003
SADD user:1001:friends 2001 2002 2003
# User 1002's friends: 2001 2003 2004
SADD user:1002:friends 2001 2003 2004
# Mutual friends (returns 2001 2003)
SINTER user:1001:friends user:1002:friends

Drawbacks

Each set element carries ~80 bytes of metadata; 500 billion edges would need >3 TB RAM. SINTER runs in O(N·M) time and can block the Redis event loop, harming latency.

Solution 2 – Bitmap (Gold)

Principle

# User 1001's friends: 2001, 2002 (bit offsets)
SETBIT user:1001 2001 1
SETBIT user:1001 2002 1
# User 1002's friends: 2001, 2003
SETBIT user:1002 2001 1
SETBIT user:1002 2003 1
# Intersection
BITOP AND common_friends user:1001 user:1002
BITCOUNT common_friends   # count of mutual friends

Benefits : Memory drops by ~98 % – 100 million users need only ~125 MB because each bitmap stores one bit per possible ID.

Limitation : BITOP works only when both keys reside on the same Redis node; in a clustered environment keys may be sharded across nodes, making cross‑node intersection impossible.

Solution 3 – Sharded Architecture (King)

Implementation Sketch

Assign users to fixed hash slots so that many user pairs share the same node. If they fall on different nodes, perform a two‑stage computation: each node returns its friend list, then a router merges the results.

public String getShardKey(Long userId1, Long userId2) {
    int shard = (userId1.hashCode() & userId2.hashCode()) % 1024;
    return "common_friends_shard_" + shard;
}
-- Lua script for atomic intersection and caching
local result = redis.call('SINTER', KEYS[1], KEYS[2])
redis.call('SET', 'cache:'..KEYS[1]..KEYS[2], result, 'EX', 3600)
return result

Performance comparison (1 billion users simulated)

Set (Bronze) : ~3 TB memory, no cross‑node support, suitable only for small‑scale real‑time queries.

Bitmap (Gold) : ~125 MB memory, fails for cross‑node operations, best for dense ID ranges.

Sharded (King) : ~300 GB memory, cross‑node latency 50‑200 ms, fits ultra‑large social platforms.

Practical Tips

Use consistent hashing for shard placement; only ~25 % of keys move when scaling.

Cache frequent intersection results with a Lua script to avoid repeated network hops.

Shard Routing Example (Java)

public String getShardKey(Long userId1, Long userId2) {
    int shard = (userId1.hashCode() & userId2.hashCode()) % 1024;
    return "common_friends_shard_" + shard;
}

Atomic Intersection via Lua

-- Compute and cache mutual friends atomically
local result = redis.call('SINTER', KEYS[1], KEYS[2])
redis.call('SET', 'cache:'..KEYS[1]..KEYS[2], result, 'EX', 3600)
return result

Cross‑Shard Merge (Python)

def get_common_friends(user1, user2):
    shard_keys = calculate_shards(user1, user2)
    results = []
    with ThreadPoolExecutor() as executor:
        futures = [executor.submit(redis_cluster.execute, shard, 'SINTER', user1, user2)
                   for shard in shard_keys]
        results = [f.result() for f in futures]
    # Merge sets from all shards
    return list(set().union(*results))

Bitmap Memory Calculation Example

Set entry: ~80 bytes per friend (metadata + hash table).

Bitmap entry: 1 bit per possible ID → 2002/8 ≈ 250 bytes for ID 2002, regardless of how many friends are set.

When friend IDs are dense (e.g., 1‑100 000), bitmap uses 100 000/8 ≈ 12.5 KB vs. Set using ~3 MB.

Cross‑Node Scenarios

Same node : Direct SINTER or BITOP on the node, latency <10 ms.

Different nodes : Each node returns its friend list; a router merges the lists in memory. Latency 50‑200 ms, acceptable for large‑scale platforms.

Sharding diagram
Sharding diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shardingredisBitmapScalable ComputingSet IntersectionMutual Friends
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.