Dynamo‑Style Leaderless Replication and Multi‑Media Storage Practices at Ctrip Hotel (Hare & InfoKeeper)
This article explains the principles of Dynamo‑style leaderless replication, strict and loose quorum, version repair, and how Ctrip Hotel applied these concepts in its multi‑media storage systems Hare and InfoKeeper to achieve high availability and performance.
Dynamo‑style databases originate from Amazon's Dynamo paper, describing a master‑less replicated key‑value store. Inspired by this, Ctrip Hotel built a multi‑media reservation store (Hare) and a high‑availability dynamic information service (InfoKeeper). The article introduces the theory and practical implementation.
1. Dynamo‑Style Databases
In distributed systems, data is replicated to improve availability and performance. Replication can be single‑master, multi‑master, or master‑less. Single‑master replication is simple but limited by the master’s capacity and suffers from write unavailability during failures. Multi‑master spreads write load but requires conflict resolution. Master‑less (Dynamo‑style) allows any node to handle reads and writes, tolerating temporary inconsistencies that are resolved during reads.
Master‑less replication introduces the need for arbitration to decide which value is correct and how many nodes must be read to guarantee correctness.
1.4 Strict Quorum
Using timestamps or version numbers, the newest value is chosen. The rule R+W>N (R = read quorum, W = write quorum, N = total nodes) guarantees that reads intersect with successful writes, ensuring the latest data is returned.
Availability calculations for different R/W settings (e.g., N=3, R=W=2) show higher read/write availability than a single node.
1.5 Loose Quorum
If strict quorum cannot be satisfied, the system may return results based on the available nodes, prioritizing availability over strict consistency. The article discusses probability calculations for reading correct data under loose quorum.
1.6 Version Repair Between Nodes
Two mechanisms are used: write‑repair (asynchronously fixing failed writes via message queues) and read‑repair (correcting stale replicas after a read).
2. Extending Leaderless Replication to Multi‑Media Storage
In Ctrip Hotel, a "node" is defined as a process running identical code (e.g., Redis master and slave). Multi‑media storage writes data to several storage media (Redis, Trocks, HBase, etc.), tolerating failures of individual media while arbitration determines the final value.
3. Hare: Multi‑Media Reservation Store
Hare stores reservation data across Redis, Trocks, and HBase. It uses loose quorum (N=3, W=1, R=1) with version numbers for arbitration. Although W=1, the system expects two successful writes; if only one succeeds, the write is still considered successful to favor availability.
Flow diagrams illustrate read/write processes under loose quorum.
4. InfoKeeper: High‑Performance Dynamic Information Store
InfoKeeper adapts Hare’s architecture for hotel price‑state storage, using only Redis and Trocks (N=2, W=1, R=1). It distinguishes "primary" media (participating in arbitration) from "secondary" media (write‑only, not affecting client response). The system now handles billions of records, 400k QPS, and reduces hardware costs by 20%.
InfoKeeper also treats message queues (QMQ, Kafka) and SOA interfaces as storage media for push‑based data propagation.
5. Validation of Design Goals
Quarterly fault‑injection drills on individual media confirm that both Hare and InfoKeeper meet design expectations: after failures, write‑repair catches up and the system remains operational.
6. Outlook
Future work aims to merge Hare and InfoKeeper code into a reusable component, lowering the barrier for new services to adopt multi‑media storage without deep knowledge of the underlying mechanisms.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.