Databases 12 min read

How to Build a Scalable, Automated MySQL Operations Platform

This article explains how to standardize and automate MySQL management at scale, covering dedicated instance deployment, configuration consistency, multi‑instance creation, metadata collection, backup, monitoring, high‑availability with Zookeeper, and task orchestration using DBTask to achieve rapid, reliable database services.

dbaplus Community

Dec 26, 2016

How to Build a Scalable, Automated MySQL Operations Platform

Usage Patterns and Challenges

Two main usage patterns are common in large‑scale MySQL deployments:

Mixed (shared) deployment : a new database is created on an existing instance for speed, but applications interfere with each other, performance metrics (QPS, TPS, RT) are unavailable, root‑cause analysis is hard, and resource utilization is inefficient.

Dedicated instance per application : each application runs on its own MySQL instance, which aligns with virtualization and micro‑services, but requires a shift from single‑instance to multi‑instance management.

These patterns raise questions about fast service provisioning, resource allocation, monitoring, and high‑availability for many instances.

Standardization

One dedicated MySQL instance per application.

Master serves traffic; replicas are used only for disaster recovery.

MySQL compiled with thread_pool support.

Uniform MySQL version across all nodes.

Identical my.cnf files except for instance‑specific parameters such as server_id and buffer_pool_size.

Consistent data‑directory layout.

DBManage Automation Platform

The DBManage system automates instance creation, deployment, monitoring, backup, HA switching, migration and scaling with minimal human intervention.

1. Multi‑Instance Creation

Multiple MySQL processes run on a single host, each bound to a different port and using its own my.cnf while sharing the same binary.

Standardized my.cnf (only server_id, buffer_pool_size differ).

Standardized data directory.

Creation steps:

Initialize a reference database.

Copy the reference to a new instance with rsync while throttling I/O.

Adjust the instance‑specific my.cnf. Direct mysql_install_db is avoided because it is I/O‑heavy.

2. Metadata Collection & Monitoring

A custom db_agent runs on each MySQL host, discovers instances automatically and reports metadata and performance metrics.

Instance information: databases, replication topology, table metadata.

Per‑second heartbeat (more reliable than SHOW SLAVE STATUS).

Performance counters: QPS, TPS, response time (RT) collected via tcprstat.

Real‑time slow‑SQL detection.

Host‑level resource usage (CPU, disk, network) replaces external tools such as Zabbix.

3. Backup

Unified backup script works for both masters and replicas.

Only replicas are backed up after role verification.

Concurrent backup throttling across instances.

Backup destination can be HDFS or a regular file server (HDFS recommended).

4. Local Execution Agent

Remote operations (instance creation, restoration, etc.) are triggered by sending a message that invokes the corresponding script on the target host via the local agent.

5. Monitoring & Alerting

Collected metrics are stored in Graphite; dashboards and alerts are built on top of this time‑series data.

6. MySQL High‑Availability

Traditional MHA‑based HA is replaced by a Zookeeper‑driven solution that eliminates virtual IPs and uses VDDS for service discovery.

VDDS creates a new application configuration. ha_agent registers a temporary Zookeeper node and continuously updates instance status. ha_console builds switch metadata (source role, master instance, replication status, delay, last change time) from Zookeeper nodes.

The console watches the alive directory; disappearance of a node triggers a failover after verifying delay and replication health.

Switch actions are logged, e.g.

master_log_file: mysql-bin.000007, exec_master_log_pos: 57830

{
  "source_db_role": "slave",
  "master_instance": "192.168.1.12_3306",
  "repl_status": "ok",
  "h_time_delay": 0,
  "repl_delay": 0,
  "last_change_time": "2016-10-15-01:00:45"
}

Supported failure scenarios include instance crash, host crash, and network partition, each resulting in an automatic and safe failover.

7. DBTask – Orchestrating Database Operations

DBTask replaces manual procedures with scripted tasks. Each script performs a single atomic action; tasks are chained to implement complex workflows such as migration or resource provisioning.

Example migration workflow:

Apply for resources.

Create instance.

Restore replica A.

Restore replica B.

Configure data source in VDDS.

Pre‑switch checks.

Execute switch.

Remove VDDS configuration.

Shutdown old instance.

Example resource‑request workflow:

Apply for resources.

Create instance.

Create database and MySQL user.

Configure data source in VDDS.

Results & Outlook

The automation stack is built with Python, Shell and Go (real‑time slow‑SQL parsing) on the backend and Laravel + AngularJS on the frontend. It currently manages over 100 instances per machine, fulfills a new database request in under one minute, and supports automated sharding and scaling. Future work aims to evolve the platform into a private‑cloud service for broader business needs.

References

Python socket communication: https://github.com/chris-piekarski/python-json-socket

Python HDFS client: https://pypi.python.org/pypi/hdfs/

Response‑time tool (rt): https://github.com/Lowercases/tcprstat

Python Zookeeper client (Kazoo): https://github.com/python-zk/kazoo

TiDB SQL parser (used for table merging): https://github.com/pingcap/tidb

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Automation High Availability Zookeeper mysql Database operations DBTask

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.