Design and Architecture of Bilibili's Thumb-up Service

The article details Bilibili’s thumb‑up service architecture, covering required business and platform capabilities, handling high read/write traffic and hot‑spot pressures, a three‑tier storage design using TiDB, Redis cache and local heap cache, disaster‑recovery mechanisms, asynchronous jobs, and future modularization plans.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Design and Architecture of Bilibili's Thumb-up Service

This article introduces the design and implementation of the thumb‑up (like) service used on Bilibili. It explains what a thumb‑up is, the business and platform capabilities required, the pressures the system must handle, and the overall architecture.

1. What is a thumb‑up? A thumb‑up (or down‑vote) can be applied to videos, dynamic posts, articles, comments, and danmaku. The system provides APIs for liking, disliking, and querying related data.

2. Required system capabilities

Business capabilities : like/unlike, dislike/undislike, query like status for a single or batch of items, retrieve like counts, get a user's liked items, list users who liked an item, and obtain a user's total received likes.

Platform capabilities : fast configuration‑level integration, multi‑tenant data isolation in cache and DB.

Disaster‑recovery capabilities : graceful degradation when storage, cache, message queue, or data center becomes unavailable.

3. System pressures

Traffic pressure : read queries (status, count) exceed 300 k QPS, write operations (like/dislike) exceed 15 k QPS. The service aggregates likes in memory over 10 s intervals before persisting to reduce DB I/O.

Hot‑spot pressure : popular items cause DB and cache hot‑spots; a hotspot‑identification mechanism moves hot data to local cache with appropriate TTL.

Storage pressure : likes data scales to hundreds of billions of records; efficient KV‑style storage is required to reduce cost.

Unknown disasters : DB crashes, Redis cluster jitter, data‑center failures, network faults, etc.

4. Overall system architecture

The thumb‑up service is divided into five layers:

Traffic routing layer (decides target data center).

Business gateway layer (authentication, anti‑fraud filtering).

Thumb‑up service (thumbup‑service) exposing unified RPC interfaces.

Asynchronous job layer (thumbup‑job).

Data layer (DB, KV, Redis).

The article focuses on the data storage layer and the service/job layers.

5. Three‑tier data storage

① DB layer – TiDB

Likes record table (likes) stores each user action with indexes on user ID (mid) and entity ID (messageID).

Likes count table (counts) aggregates like/dislike numbers per business entity.

TiDB’s distributed nature removes the need for manual sharding.

② Cache layer – Redis (Cache‑Aside pattern)

Two main cache keys are used:

key-value = count:patten:{business_id}:{message_id} - {likes},{disLikes}
用业务ID和该业务下的实体ID作为缓存的Key,并将点赞数与点踩数拼接起来存储以及更新

And for user like lists:

key-value = user:likes:patten:{mid}:{business_id} - member(messageID)-score(likeTimestamp)
* 用mid与业务ID作为key,value则是一个ZSet,member为被点赞的实体ID,score为点赞的时间。当改业务下某用户有新的点赞操作的时候,被点赞的实体则会通过 zadd的方式把最新的点赞记录加入到该ZSet里面来
为了维持用户点赞列表的长度(不至于无限扩张),需要在每一次加入新的点赞记录的时候,按照固定长度裁剪用户的点赞记录缓存。该设计也就代表用户的点赞记录在缓存中是有限制长度的,超过该长度的数据请求需要回源DB查询

③ Local cache – in‑process heap cache identifies hot keys using a min‑heap algorithm and stores them locally with a configurable TTL.

6. Storage optimization and migration

To reduce TiDB storage cost, historical data is archived to a KV store (Taishan). The KV key patterns are:

1_{mid}_${business_id}_${type}_${message_id} => {origin_id}_{mtime}
2_{mid}_${business_id}_${type}_${mtime}_{message_id} => {origin_id}
3_{message_id}_${business_id}_${type}_${mtime}_${mid}=>{origin_id}

These patterns support fast look‑ups for like records, user‑like indexes, and entity‑level indexes.

7. Thumb‑up service layer (thumbup‑service)

Provides the C‑end interface and includes disaster‑recovery designs such as dual‑data‑center deployment, DB‑proxy failover, and two‑cluster Redis replication. When both caches are unavailable, the KV store or TiDB serves read/write traffic with rate‑limiting.

8. Asynchronous job layer (thumbup‑job)

Writes user behavior data (like, dislike, cancel) to persistent storage.

Refreshes caches for like status, lists, and counts.

Publishes async messages for downstream services.

Monitors TiDB binlog latency; if binlog stalls, the service emits its own disaster‑recovery messages to keep downstream consumers up‑to‑date.

9. Future plans

Modularize the thumb‑up service.

Platform‑ize the service to support custom data isolation per business.

Explore new business models derived from the like interaction.

The article concludes with an invitation for feedback and links to related technical posts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backenddistributed storageBilibilithumbup
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.