How Bilibili Scaled Its Relationship Graph: From MySQL to KV, Caches, and Hotspot Resilience
This article details how Bilibili’s relationship‑chain service evolved from a MySQL‑based design to a KV store with multi‑layer caching, introducing bloom‑filter‑enhanced Redis caches and hotspot mitigation techniques to sustain near‑million QPS traffic while maintaining data accuracy and high availability.
1. Relationship‑Chain Business Overview
In Bilibili’s main site, a relationship‑chain represents the follow relationship between users, covering follow, blacklist, whisper, mutual follow, and special follow states. The service provides one‑to‑many relationship queries, full‑list retrieval, and counting, handling peak QPS close to one million and supporting core features like dynamic and comment systems.
2. Storage Bottleneck and Evolution
Initially, a sharded MySQL schema (500 tables for relationships, 50 tables for counts) handled the modest early data volume. A single follow action required multiple row locks and updates across relationship and count tables, leading to heavy transactional load.
As data grew to terabytes and write traffic increased, MySQL could no longer sustain the load, causing higher failure rates and message back‑pressure. The team migrated the primary store to an internally built distributed KV store, replacing synchronous MySQL writes with asynchronous KV writes while still replicating to MySQL for binlog consumers.
KV storage uses keys of the form {attr|mid}fid where attr denotes the relationship type (e.g., ATTR_FOLLOW, ATTR_BLACK) and values store a struct with attribute (WHISPER, FOLLOW, FRIEND, BLACK) and mtime. This design eliminates the need for a separate count table because the KV service provides a native count operation.
3. Cache Iterations for Rapid Growth
3.1 Memcached Layer – A memcached cache stores full follow lists to reduce costly KV scan operations. It offloads 97‑99% of requests, leaving only a few thousand QPS to miss the cache.
3.2 Redis Hash Layer – For point‑lookup scenarios, a Redis hash maps each user ID ( mid) to a hash of related user IDs and their relationship data. This reduces latency to ~1 ms with a 70‑75% hit rate, supporting near‑million QPS.
3.3 Redis KV Layer – To simplify the architecture, a Redis KV cache stores direct mid‑fid relationships. Misses trigger KV get / batch_get calls, which are faster than scans.
3.4 Bloom‑Filter + Redis KV – A per‑user Bloom filter filters out “empty sentinel” lookups (common in comment‑driven queries). Only when the filter indicates a possible relationship does the request proceed to the Redis KV layer, raising overall cache hit rates above 80%.
4. Hotspot Resilience
During traffic spikes on popular creators, Redis instances experienced CPU usage above 70‑90%. The system introduced a local cache for hot users and later an automated hotspot detection tool that dynamically adds hot keys to the local cache, reducing Redis load and preventing RPC‑level errors.
5. Long‑Term Planning
The team aims to expose the relationship‑chain capabilities as a multi‑tenant service for other business units, generalize relationship objects (e.g., feeds, collections, comics), and enforce strict quota isolation to protect core services from traffic surges.
Overall, the migration from MySQL to KV, combined with layered caching and automated hotspot handling, enabled Bilibili to sustain rapid growth while keeping latency low and reliability high.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
