Databases 11 min read

How to Build a Scalable Automated MySQL Operations Platform

This article explains how to standardize and automate MySQL operations—including multi‑instance deployment, metadata collection, monitoring, backup, and high‑availability using Zookeeper—so that large‑scale database services can be provisioned, managed, and scaled with minimal human intervention.

Weidian Tech Team
Weidian Tech Team
Weidian Tech Team
How to Build a Scalable Automated MySQL Operations Platform

Introduction

In the Internet era, databases must meet agile development and delivery requirements. Traditional DBA‑driven processes cannot keep up with large numbers of business demands.

Database Usage Patterns

Two main patterns are described: mixed‑deployment (shared instance) and dedicated‑instance per application. The shared approach is quick but leads to interference, lack of performance metrics, difficult troubleshooting, and inefficient resource use, often requiring costly manual sharding.

Dedicated instances solve these problems by giving each application its own MySQL instance (or sharded tables), but this requires many instances and a shift from single‑instance to multi‑instance operations.

Problems to Address

How to quickly provision databases, allocate resources rationally, monitor databases, and provide high‑availability for multiple instances?

MySQL Standardization and Automation

Our automated MySQL ops system handles large‑scale pain points such as instance creation, deployment, monitoring, backup, HA switching, migration, and scaling with minimal human intervention.

Standardization

Application‑dedicated instances

Master‑only data flow; replicas do not serve traffic (except for disaster recovery)

Consistent MySQL version and configuration (identical my.cnf except server_id, buffer_pool_size, etc.)

Uniform directory layout

Automation Framework (DBManage)

A comprehensive large‑scale ops platform that automates everything without requiring trust relationships between machines.

Multi‑Instance Creation

Run multiple MySQL processes on different ports of a single host, sharing the binary but using separate configuration files and data directories.

Identical my.cnf (except unique parameters)

Standardized data directory

Instance creation via rsync and custom my.cnf to avoid heavy I/O from mysql_install_db

Metadata and Monitoring

Custom

db_agent

runs on each server, automatically discovers instances, collects metadata, performance metrics (QPS, TPS, RT), heartbeat, slow SQL, and host metrics, eliminating manual entry and external tools like Zabbix.

Backup

Backup scripts deployed on all database machines handle backup of replicas, concurrent multi‑instance control, and direct storage to HDFS or a server.

Remote Execution Agent

Allows remote creation, recovery, and other operations by invoking scripts on the target DB machine.

Monitoring and Alerting

Metrics collected by

db_agent

are visualized in Graphite and trigger alerts.

MySQL High Availability

Instead of traditional MHA with VIP, we use Zookeeper‑based HA with VDDS, removing VIP and MHA components. Instances register temporary nodes in Zookeeper;

ha_console

builds switch metadata and performs failover based on delay and replication status.

HA workflow includes VDDS configuration, ha_agent registration, ha_console metadata construction, node monitoring, and automatic switch with logging.

{
  "source_db_role": "slave",
  "master_instance": "192.168.1.12_3306",
  "repl_status": "ok",
  "h_time_delay": 0,
  "repl_delay": 0,
  "last_change_time": "2016-10-15-01:00:45"
}

DBTask Automation

DBTask replaces manual steps, breaking complex actions (e.g., database migration) into discrete scripts that are chained together.

Database migration steps: request resources, create instance, restore replicas, configure VDDS, pre‑switch checks, switch, clean VDDS, shut down old instance.

Results and Outlook

The full automation stack uses Python, Shell, and Go (for slow‑SQL parsing) on the backend and Laravel + AngularJS on the frontend. A single host runs over 100 instances; provisioning takes less than a minute; automated sharding is in progress, moving the database toward a private‑cloud platform.

References

Python socket communication: https://github.com/chris-piekarski/python-json-socket

Python HDFS client: https://pypi.python.org/pypi/hdfs/

RT measurement tool: https://github.com/Lowercases/tcprstat

Python Zookeeper client: https://github.com/python-zk/kazoo

Go TiDB SQL parser: https://github.com/pingcap/tidb

MonitoringautomationHigh AvailabilityMySQLBackupscalingDatabase Operations
Weidian Tech Team
Written by

Weidian Tech Team

The Weidian Technology Platform is an open hub for consolidating technical knowledge. Guided by a spirit of sharing, we publish diverse tech insights and experiences to grow and look ahead together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.