Evolution of Bilibili Relationship Chain Service: From MySQL to KV Storage and Multi‑Layer Caching
Bilibili’s relationship‑chain service, which handles follows, blacklists, whispers and mutual follows, migrated from a single sharded MySQL instance to an internal distributed KV store and introduced a three‑tier cache (memcached, Redis and a Bloom filter) plus automated hotspot routing, achieving near‑million QPS, lower latency, and preparing for multi‑tenant reuse.
The article introduces the relationship‑chain business of Bilibili, which manages follow, blacklist, whisper, and mutual‑follow relations between users. The service provides one‑to‑many relationship queries, full‑list retrieval, and relationship counters, handling peak QPS close to one million and serving core modules such as dynamic, comment, and recommendation.
1. Storage bottleneck – MySQL
In the early stage, a single MySQL instance with sharded tables (500 tables for relations, 50 tables for counters) was sufficient. A follow request triggers a multi‑step transaction: lock counters, lock forward and reverse relation rows, update relation states, and adjust follower/following counts. As data grew to the terabyte level, the transaction volume caused high write latency, increased failure rates, and message back‑pressure when simply switching to asynchronous writes.
2. Migration to a KV store
The team replaced MySQL with an internal distributed KV store and changed the write path from synchronous MySQL writes to asynchronous KV writes (still keeping an asynchronous MySQL write for downstream binlog consumers). The KV design uses a key pattern {attr|mid}fid where attr denotes the relationship chain type (e.g., ATTR_FOLLOW, ATTR_WHISPER, ATTR_BLACK, etc.) and the value is a struct containing attribute (the actual relationship state) and mtime (modification time). This eliminates the need for a separate counter table because the KV service provides a count operation.
Read operations become simple point‑gets or scans over the three forward attributes. Write operations, lacking transactional guarantees, are protected by infinite retry logic and ordered processing of messages with the same user‑pair key to avoid conflicts.
3. Cache evolution
To handle the heavy scan workload, a multi‑layer cache was introduced:
Memcached stores the full follow list per user, reducing KV scans by 97‑99%.
Redis hash caches per‑user follow relationships for point‑queries, achieving 70‑75% hit rate and sub‑millisecond latency.
Later, a Redis KV cache (key = userA|userB, value = relationship) replaced the hash to simplify lookups, but its miss rate rose to ~40% due to many “no‑relationship” queries from comment pages.
A Bloom filter layer was added in front of the Redis KV cache to filter out empty‑relationship queries, raising overall hit rate above 80%.
4. Hotspot mitigation
When a small set of popular users generated concentrated traffic, Redis instances experienced CPU spikes (>90%). The initial manual hotspot list was replaced by an automated detection tool that temporarily routes hot keys to a local cache, dramatically reducing Redis load.
5. Future directions
The team plans to expose the relationship‑chain capabilities as a reusable service for multi‑tenant scenarios, broaden the concept of “relationship” to include subscriptions to collections, comics, etc., and enforce strict quota isolation to prevent one business’s traffic surge from affecting core services.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.