Designing a High‑Availability Cache Consistency Solution for the Creator Red Packet System
This article explains how the creator red‑packet feature was engineered to guarantee idempotent, fault‑tolerant, and high‑throughput red‑packet claims by using multi‑level caching, empty placeholders, binlog‑driven synchronization, active cache invalidation, ordered Kafka consumption, and fallback strategies to resolve cache‑DB consistency issues.
The creator red‑packet (video‑based reward) activity requires that each user can claim a red packet exactly once, tolerate failures, and handle massive read/write traffic (hundreds of thousands of claims and millions of queries).
Key technical requirements are idempotency (one claim per user), fault tolerance (retry without duplicate claims), and high availability via multi‑level caching, where reads dominate writes.
Challenges include cache consistency, cache penetration, dirty writes, and synchronization latency between the database and cache.
Improve cache hit rate and avoid unnecessary DB fallback.
Address asynchronous cache sync delay.
Resolve cache consistency problems.
Handle "dirty write" scenarios.
Initial implementation used MySQL for idempotency, a "retry + idempotent" pattern for fault tolerance, and a classic "cache + DB fallback" model, but experienced inconsistencies.
Cache Penetration Solution : When a key does not exist, write an empty placeholder to the cache to prevent repeated DB fallback, and filter out the placeholder on reads.
Active Cache Invalidation : After updating the claim status in the DB, explicitly delete the corresponding cache entry to avoid stale reads caused by binlog latency.
Binlog Efficiency Optimization : Reduced the binlog processing delay by decreasing the temporary buffering time, cutting the P99 latency from ~1 s to a lower value.
Ordered Binlog Consumption : Ensured binlog messages are consumed in order and routed to the same thread per user, eliminating dirty writes caused by out‑of‑order processing.
Synchronous Cache Writes : Changed the cross‑region cache sync from asynchronous Kafka writes to synchronous writes, guaranteeing that both cache clusters receive updates together.
Fallback Strategy : Implemented a periodic task that scans the DB and cache, invalidating mismatched cache entries; also considered delayed double‑delete and setnx‑based placeholders.
After these optimizations, the system handled millions of users during public testing and the Spring Festival event without cache‑DB consistency errors, ensuring stable operation.
Appendix Q&A
Idempotency is achieved with MySQL primary‑key inserts (INSERT IGNORE / ON DUPLICATE KEY).
Fault tolerance relies on user‑initiated retry after a failed upload.
Binlog order is maintained by assigning data to the same consumer thread.
In case of severe master‑slave lag, the binlog node can be attached to the primary DB.
Cache routing uses the cache instance of the current data center.
Alternative cache‑hit improvements include Bloom filters and pre‑warming hot users.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.