How HuoLa La Built a Hybrid‑Cloud Database Middleware for Massive MySQL Scale
This article details HuoLa La's journey of designing, implementing, and evolving a hybrid‑cloud self‑built database middleware that unifies multi‑cloud environments, achieves up to 1024× MySQL horizontal scaling, and addresses challenges such as multi‑language stacks, high availability, SQL governance, and multi‑AZ deployment.
Background and Motivation
HuoLa La operates a globally distributed internet service with a multi‑cloud, multi‑data‑center architecture. Rapid data growth and an expanding set of services put pressure on databases to deliver high performance, stability, and cloud‑agnostic behavior. Commercial cloud database services are easy to deploy but lack cross‑cloud compatibility, while open‑source solutions often miss enterprise‑grade reliability and customizability. To meet the need for a unified, high‑performance, highly available middleware that works across heterogeneous clouds, the team built an in‑house database proxy called DBProxy .
System Architecture
The overall data‑layer stack (named HLL) consists of:
Ingress layer: Open‑source KONG gateway, custom LAPIGateway, self‑built WAF, and commercial DDoS protection.
Core framework: JAF base framework, HLL‑SOA micro‑service framework, LLjob task system, and HLL‑Monitor monitoring/alerting.
Database middleware: DBProxy – a heterogeneous proxy that integrates with the HLL ecosystem (DMS, configuration & registration centers) and provides cross‑cloud sharding, high availability, and performance optimizations.
DBProxy Technical Stack
Java runtime with ZGC low‑pause garbage collection.
Asynchronous networking via Netty .
Event‑driven programming model.
Heterogeneous proxy layer that supports multiple client languages (Java, Go, Node, etc.).
Key Functionalities
Cross‑cloud sharding & scenario‑based routing: DBProxy abstracts away differences between cloud providers, presenting a unified logical schema to services.
Token‑bucket DB concurrency limiting: Limits concurrent DB sessions based on DB core count; dynamically injects index hints for slow SQL to reduce contention.
ShardingMappingKey: Maintains a mapping table for tables with multiple unique IDs, avoiding full‑shard scans in multi‑ID sharding scenarios.
Dynamic index‑hint injection: At runtime DBProxy can add /*+ INDEX(...) */ hints to queries that would otherwise miss optimal indexes, especially during traffic spikes.
SQL safety audit & deep insight: Captures full SQL text, execution plan, fingerprint (SQLID), and trace context without requiring code changes in business services.
SQL pre‑release warning: Uses the deep‑insight data to flag risky statements in staging environments before they reach production.
SQL traffic simulation: Replays recorded production traffic with accurate timing and payload size to validate capacity and avoid resource waste.
Operability Features
Smooth rolling upgrades with connection pre‑warming, fast start‑up, graceful shutdown, and graceful connection release.
Comprehensive monitoring and alerting integrated into HLL‑Monitor, enabling early detection of anomalies.
High‑Availability Design
Clustered deployment across business‑specific resource pools to balance load.
Congestion control that throttles large result‑set transfers, similar to TCP sliding‑window behavior, preventing memory blow‑out.
Thread‑convergence mechanism that caps the number of active threads to keep CPU and memory usage predictable, especially important for ZGC performance.
Future Directions
The next major challenge is extending the architecture to multi‑Availability‑Zone (multi‑AZ) deployments. A simplified three‑AZ design can handle 50 % of traffic per AZ with automatic failover via Kubernetes elasticity. However, multi‑AZ introduces additional latency and IT‑cost considerations. Sidecar‑based solutions such as RedisMesh mitigate cross‑AZ latency by colocating the proxy with the application process.
Long‑term roadmap envisions transforming DBProxy, KafkaGateway, and similar components into sidecar‑mode DataMesh services that integrate tightly with cloud‑native infrastructure, offering better scalability and cloud‑agnostic adaptability.
Additional Application Scenarios
SQL safety auditing across all databases without invasive instrumentation.
Deep SQL insight with fingerprinting and trace correlation for precise performance debugging.
Pre‑release SQL risk detection using the same insight engine.
Traffic simulation to validate capacity planning and avoid over‑provisioning.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
