Building a High‑Performance Cache Cluster: HuoLala’s DataMesh & Redis Journey
This article details how HuoLala tackled cache bottlenecks by evolving from single‑node master‑slave setups to Redis Cluster and a custom DataMesh sidecar, achieving sub‑2 ms latency, 99.999% availability, automated scaling, and advanced data governance on Kubernetes.
1. Background
As HuoLala’s traffic grew, its cache layer faced scaling and stability challenges. The company needed rapid horizontal expansion, faster resource delivery, support for diverse business scenarios, and stronger high‑availability guarantees while migrating services from PHP to Java.
The cache architecture evolved through several stages: single master‑slave, Codis, Sentinel mode, and finally an integrated Redis Cluster with a DataMesh middleware that improved stability and introduced data‑governance capabilities.
2. Challenges
Four key capabilities were required:
Ultra‑low latency: maximum 10 ms, average below 2 ms.
High availability: SLA 99.999% (no more than 5 minutes downtime per year) with seconds‑level recovery.
Minimize custom client development effort.
Support varied scenarios such as distributed locks.
These requirements exposed several high‑availability challenges.
HA Options
Master‑slave: solves single‑point failure but needs manual intervention.
Sentinel: automates failover and horizontal scaling but adds deployment complexity.
Codis proxy: provides cluster management and a visual dashboard, at the cost of an extra network hop.
Redis Cluster: decentralized, relies on gossip for cluster maintenance, supports linear scaling, but requires an intelligent client.
3. Resource Delivery Efficiency
High‑availability was achieved through a data‑control platform that handles provisioning, scaling, and data governance. Before creating resources, a stateless cache node is initialized and added to a resource pool; during allocation, a scheduling algorithm selects nodes to form a robust cluster.
Scaling and fault‑migration leverage Redis Cluster’s migrate command, with automated slot migration scripts. Data governance includes periodic RDB backups to cloud storage, full‑key analysis, and hot‑spot detection.
4. DataMesh Construction
DataMesh acts as a Redis proxy deployed as a sidecar between the ingress layer and data services. It provides sharding, rate limiting, and seamless cluster switching.
The sidecar runs in a Kubernetes DaemonSet that manages configuration and executable files. Pods host both the DataMesh container and the sidecar, sharing the same host directory to separate file publishing from process execution, enabling zero‑downtime upgrades.
Why Sidecar?
The sidecar uses a loopback network, adding only 0.02 ms latency. Kubernetes schedules and isolates resources, keeping each pod’s CPU usage below 0.1 core. This design minimizes additional network overhead and operational cost.
Non‑Stop Upgrade Feature
In a typical Kubernetes node, dozens to hundreds of pods run simultaneously. Upgrading each pod’s sidecar individually would cause traffic spikes; batch upgrades would prolong the window. The solution separates configuration upgrades (handled by the DaemonSet) from process upgrades (triggered by the release system), allowing grouped pod upgrades with smooth handover.
5. Final Benefits
Functional gains:
Smart client: dynamically listens to slot migrations, master‑slave switches, updates topology, and supports pipelining across shards.
Cluster migration: in disaster scenarios, the system can switch to a new cache cluster automatically via DataMesh.
Data governance: connection isolation reduces pressure on databases, hot‑key analysis reports QPS and payload, and built‑in rate‑limiting/ degradation mitigates hot‑key issues.
Intelligent operations: an event‑driven model lets DBAs trigger changes (e.g., hot‑key throttling, cluster scaling) through the DataMesh console. The console monitors event status for traceability, and all changes are applied without client‑side intervention, ensuring automatic recovery.
6. Future Plans
HuoLala’s cache system has settled on Redis Cluster with DataMesh middleware, simplifying integration and strengthening operations. Future extensions aim to proxy other components such as MySQL and MQ, leveraging DataMesh’s programming capabilities to fill additional operational gaps.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
