How Bilibili Scaled Its Relationship Chain Service from MySQL to KV and Redis

This article details how Bilibili’s relationship‑chain service evolved from a MySQL‑based design to a KV‑store with asynchronous writes, introduced multi‑layer caching with memcached, Redis hash and KV, added bloom‑filter optimization, and implemented hotspot mitigation to support millions of QPS.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How Bilibili Scaled Its Relationship Chain Service from MySQL to KV and Redis

1. Relationship Chain Overview

From the main site perspective, a relationship chain represents the follow relationship between user A and user B, covering follow, blacklist, whisper, mutual follow, and special follow states. The service provides one‑to‑many relationship queries, full‑list retrieval, and counting, handling peak QPS near one million and supporting core features like dynamics and comments.

2. Storage Bottleneck – Evolution

Initially MySQL was used with sharded tables for relationships and counts. As data grew to terabytes, transaction overhead caused high write failure rates. The solution migrated storage to a KV store with asynchronous writes, keeping MySQL as a redundant binlog source. KV keys are composed of {attr|mid}fid and values store attribute and modification time.

Attributes define five chain types (whisper, follow, blacklist, whispered‑by, followed‑by) and map to four possible relationship attributes (WHISPER, FOLLOW, FRIEND, BLACK). This design eliminates the need for a separate count table.

3. Rapid Growth – Cache Iteration

To reduce expensive KV scans, a memcached layer caches full follow lists. Redis hash caches one‑to‑many relationships, using user ID as the key and follower IDs as hash fields. Later a Redis KV cache replaced the hash to enable point‑lookups, improving latency to ~1 ms and supporting near‑million QPS.

A bloom filter was added in front of the Redis KV cache to filter out empty‑sentinel queries, achieving over 80% hit rate and reducing unnecessary KV scans.

4. Risk – Hotspot Resilience

Hotspot traffic (e.g., popular creators) caused Redis CPU spikes. Initially, hotspots were manually added to a whitelist with local cache fallback. An automated hotspot detection tool now dynamically adds hot keys to local cache, smoothing load and preventing CPU saturation.

5. Long‑term Planning

Future work includes exposing the relationship‑chain API as a multi‑tenant service for other business units, generalizing relationship objects (e.g., up‑hosts, collections, comics), and improving stability through zero‑trust configurations and quota enforcement to isolate traffic spikes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsScalabilitycachingdatabase migration
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.