Databases 44 min read

How HuoLaLa Built a Hybrid‑Cloud Database Governance Platform

This article outlines HuoLaLa's journey from a fragmented multi‑cloud environment to a unified, platform‑driven database governance system, detailing the technical challenges, architectural decisions, key components for MySQL, Redis, Kafka, and other middleware, and the measurable stability and cost improvements achieved after a year of operation.

dbaplus Community
dbaplus Community
dbaplus Community
How HuoLaLa Built a Hybrid‑Cloud Database Governance Platform

Hybrid Cloud Database Governance Background

HuoLaLa operates in many countries and regions, resulting in a highly complex technical environment. In the wave of cloud adoption the company needed to quickly build systems on hybrid‑cloud infrastructure, but managing databases across multiple clouds proved difficult.

Challenges

Unexpected costs from cloud features such as backup, billing differences, and default settings.

Inexperience with numerous databases and middleware, leading to a steep learning curve.

Product differences between cloud providers (e.g., DRDS vs. alternatives) making cross‑cloud migration hard.

The need for unified multi‑site operations and a consistent developer experience.

Pain Points & Demands

Stability – high fault frequency, with P0‑P2 incidents common.

Developer efficiency – DBAs were manually handling many requests, causing delays.

Cost – cloud pay‑as‑you‑go model created early‑stage cost pressure.

Governance Principles

The team adopted a “do‑less‑more‑focus” approach: cut unnecessary database choices, define strict standards, build platform capabilities, and aim for a 70‑point minimum viable product.

Platform Architecture

A micro‑service based Service‑API layer with a unified gateway was built; each site runs its own services but shares a common data bus for monitoring and task distribution.

Key Components

MySQL Platform

Health dashboard based on weighted metrics of about 50 indicators, allowing one‑click visibility of instance health.

Alarm aggregation and reduction to avoid alert fatigue.

DDL handling via gh‑ost with a task queue and concurrency control to process thousands of daily DDL statements.

SQLReview built on TiDB’s parser for static analysis, integrating DBA experience into automated checks.

Redis Platform

Monitoring dashboard mirroring the MySQL approach.

Instance replacement and scaling to handle memory‑over‑sell scenarios.

Key analysis via lightweight Go agents that keep memory usage under 100 MB even for >20 GB RDB files.

Cluster‑mode migration and ServiceMesh sidecar for multi‑language access and reduced cross‑AZ latency.

Kafka Platform

Unified management of clusters, topics, and consumers.

Time‑based consumer lag calculation (instead of raw offset lag) for more intuitive alerting.

Metadata gateway enforcing resource‑ID based access, binding topics to consumers securely.

Multi‑tenant isolation via zone‑based broker groups, allowing high‑spec and low‑spec topics to coexist without interference.

Other Middleware (ES, RabbitMQ, Canal)

Integrated via their web‑admin APIs; metadata is cached and exposed through the platform for unified control, enabling centralized monitoring, index rotation, and data subscription pipelines.

Self‑service for Developers

Developers can approve work orders, query MySQL/Redis/ES/Kafka, and perform releases through a small‑program UI, dramatically reducing DBA manual workload.

Results After One Year

Significant reduction in incident frequency and severity; daily alarm volume dropped from hundreds to a manageable level.

Cost savings from better resource utilization (40‑60 % memory reduction) and automated scaling.

Standardized operations across all services, with a clear governance dashboard and data‑driven decision making.

Reflections on DBA Role in the Cloud Era

DBAs shift from low‑level infrastructure work to performance, cost, and business‑oriented optimization, requiring broader skill sets and embracing cloud‑native tools. Understanding cloud product nuances becomes more valuable than deep kernel knowledge, and the future may see platform functions moving directly to cloud‑native services.

//默认配置
$redisList = [
    'tcp://127.0.0.1.1:7000?timout=3.0',
    'tcp://127.0.0.1:7000?timout=3.0',
    'tcp://127.0.0.1:7000?timout=3.0',
    'tcp://127.0.0.1:7000?timout=3.0',
];
//通过绑定slots解决
$redisList = [
    'tcp://127.0.0.1:7000?timout=3.0&slots=1-100',
    'tcp://127.0.0.1:7000?timout=3.0&slots=101-200',
    'tcp://127.0.0.1:7000?timout=3.0&slots=201-300',
    'tcp://127.0.0.1:7000?timout=3.0&slots=301-400',
    //...
];
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

platform engineeringredisKafkamysqlhybrid cloudDatabase Governance
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.