Backend Development 10 min read

Local Cache Optimization for Outbox Redis in a High‑Traffic Feed Stream Service

To protect the outbox Redis cluster from extreme read amplification during hot events, the service adds a resident local cache for hot creators’ latest posts, using a threshold‑based list, change‑broadcast updates, and checksum verification, which achieved over 55% cache hits and cut peak Redis load by roughly 44% and CPU usage by 37%.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Local Cache Optimization for Outbox Redis in a High‑Traffic Feed Stream Service

The feed‑stream service experiences severe load spikes during hot events, causing the outbox Redis cluster to reach CPU utilizations of 95%–100% and leading to timeouts for many users.

Analysis reveals that the system uses a "push‑pull" model: each user has an inbox (inbox) and an outbox. The outbox stores the user’s own posts and is read heavily for large‑follower ("big‑fan") creators, resulting in massive read amplification (hundreds to thousands of times) on the outbox Redis cluster.

Two mitigation ideas were evaluated. Increasing the threshold for "push" processing would reduce outbox reads but would dramatically increase inbox write pressure and storage costs. The chosen solution is to introduce a local cache for hot creators’ latest posts, which adds no hardware cost and can be deployed quickly.

The design defines a follower‑count threshold to identify hot creators (UPs). Daily offline statistics (T+1) compute the list of creators that meet or fall below the threshold; the list is pushed to Redis via Kafka. When an outbox service instance starts, it loads the full hot‑creator list and pulls their latest post lists from the outbox Redis into a resident local cache. Subsequent updates to the hot‑creator list trigger incremental cache updates. Feed requests first consult the local cache; only cache misses fall back to the outbox Redis, greatly reducing read amplification.

To avoid a cache‑stampede when rebuilding data, the solution uses a "change broadcast + asynchronous reconstruction" pattern. When a cached creator publishes or deletes a post, a change event is broadcast to all outbox instances, which then asynchronously fetch the updated list from Redis and refresh their local cache. Because hot‑creator changes are rare, this approach imposes minimal load on Redis.

Consistency between the local cache and Redis is ensured through two mechanisms: (1) after each reconstruction, the checksum of the creator’s post list is stored in Redis; (2) a periodic inspection compares cached checksums with Redis checksums, and any mismatch triggers a rebroadcast of the change event for automatic repair.

After deployment, the local cache achieved a hit rate exceeding 55%. Compared with the pre‑deployment baseline, the weekend peak load on outbox Redis dropped by roughly 44%, and the CPU usage peak per Redis instance decreased by 37.2%.

Future work includes extending the cache to hot creators of other content types (e.g., anime, paid videos) and adding real‑time hot‑key detection for creators that become hot suddenly, further improving cache coverage and protecting Redis from overload.

backend developmentRedisconsistencyCache Optimizationfeed streamPerformance Scaling
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.