Design and Implementation of Distributed Cache with Eventual and Strong Consistency at Ctrip Finance
This article presents Ctrip Finance's design of a unified high‑availability Redis cache service, covering both eventual‑consistency and strong‑consistency scenarios, the overall architecture, data‑accuracy, completeness and availability mechanisms, lock handling, fault‑tolerant updates, and operational recovery strategies.
The author, a senior technical expert at Ctrip, introduces the evolution of Ctrip Finance's architecture and the need for caching to relieve MySQL read pressure and improve response times, while addressing consistency challenges.
Two main scenarios are described: an eventual‑consistency distributed cache (utag) used for non‑critical data such as risk control and app entry hints, and a strong‑consistency cache for loan‑pre‑service where data must be up‑to‑date.
Eventual‑Consistency Design
The utag service provides a Dubbo cache query interface, deployed across multiple data‑centers (AB sites) with asynchronous MQ‑driven updates (QMQ and Kafka). Data‑accuracy is ensured by a four‑step update process (trigger, fetch DB, compare, update cache) protected with Redis‑based distributed locks and delayed messages for same‑second updates. Data‑completeness is achieved through multiple trigger sources, weekly full‑table refreshes, and cache‑DB sync checks. High availability is provided by cross‑site replication, dual‑MQ backup, and fast‑recovery mechanisms that can refresh the entire cache within 30 minutes.
Strong‑Consistency Design
For loan‑pre‑service, the cache follows a write‑DB‑then‑delete‑cache pattern. To avoid stale reads and write‑read races, two Redis locks are applied: one around DB‑update + cache‑deletion, another around DB‑read + cache‑write. Lock granularity options are discussed, and the default is a pre‑transaction lock (option 3) with fallback to post‑transaction locking when Redis is unavailable.
Cache‑deletion failures are compensated by a lightweight cache_key_queue table that records pending deletions; asynchronous retries and scheduled scans clean up stale entries.
CREATE TABLE `cache_key_queue` (
`id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'primary key',
`cache_key` varchar(1024) NOT NULL DEFAULT '' COMMENT 'key to delete',
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'creation time',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 CHARSET=utf8 COMMENT='Cache deletion queue';Spring's TransactionSynchronization interface is used to insert queue records before commit and delete caches after commit, ensuring atomicity between DB changes and cache invalidation.
public interface TransactionSynchronization extends Flushable {
void suspend();
void resume();
void beforeCommit(boolean readOnly);
void beforeCompletion();
void afterCommit();
void afterCompletion(int status);
}The system also implements a Redis circuit‑breaker: if a configurable number of Redis errors occur within a time window, cache operations are bypassed and the service falls back to direct DB access. Recovery checks the health of all Redis cluster nodes before re‑enabling cache reads and writes, and ensures no pending cache_key_queue entries remain.
Overall, the eventual‑consistency cache reduces DB pressure, improves response latency (P98 ≈ 10 ms), and achieves ~92 % hit rate, while the strong‑consistency approach lowers core DB QPS by 80 % and cuts average response time by ~10 % through disciplined locking and fallback strategies.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
