Cross‑IDC Disaster Recovery Solution for KV Storage via a Proxy Layer
The article proposes a cross‑IDC disaster‑recovery architecture for key‑value stores that uses a storage proxy layer to record write logs in the primary data center, forward them to a read‑only backup center for replay, and route reads via load‑balancing, while noting current limitations such as complex configuration, low synchronization concurrency, and lack of multi‑write support.
1. Background
Currently some KV stores do not support cross‑IDC deployment, so a data‑center failure can affect KV store availability. This article presents a solution that achieves cross‑IDC disaster‑recovery deployment through a KV storage proxy layer.
2. Implementation Principle
Considering that multi‑write multi‑read across locations is complex and data recovery would be difficult, this scheme adopts a single‑write multi‑read model: the primary IDC supports both read and write, while the backup IDC is read‑only. The primary IDC’s storage proxy writes operation logs to disk; a Notify program transfers the logs to the backup IDC’s storage proxy Redo service for replay. The Notify program distributes log files to the local IDC or to the backup IDC. If the primary IDC fails, read requests can be routed via a load‑balancer to the backup IDC, achieving read‑side disaster recovery. If the backup IDC fails, read requests can be routed to the primary IDC, so read/write operations remain unaffected. To reduce the load on the storage proxy, read‑write services can be separated, allowing read operations to access local storage directly.
3. KV Storage Proxy Layer Implementation Principle
Clients access the KV storage proxy via API. For write operations, the proxy first writes a log and then performs the local KV store operation. A log‑forwarding program scans the logs every 10 ms and forwards them to a log‑conversion service. The conversion service transforms all commands into Set operations, where the value is obtained by a Get command from the local KV store for the latest value. The logs are then synchronized to another city’s Redo Set command. Additionally, a consistency‑check service scans log files older than one minute; if it detects data inconsistency between the two locations, it generates logs that need to be retried.
4. Summary
Most services currently use KV storage as persistent storage, but KV stores themselves often do not support cross‑IDC disaster recovery. This article provides a generic solution to achieve cross‑IDC disaster recovery for KV storage.
However, there are still issues such as excessive configuration files making operations inconvenient, lack of a friendly management console, low concurrency for cross‑IDC write‑log synchronization, and lack of support for multi‑write multi‑read. Future work can continue to improve these aspects.
Tencent Music Tech Team
Public account of Tencent Music's development team, focusing on technology sharing and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.