How ZanDB Automates MySQL Backup, Task Scheduling, and Instance Management
This article outlines ZanDB's approach to standardizing MySQL environments, implementing a real‑time backup monitoring system, and building a comprehensive automation platform that includes task scheduling, host and instance management, log handling, metadata tracking, and daily maintenance to streamline DBA operations.
Background
In the era of rapid internet growth, database administrators (DBAs) face exploding workloads, requiring faster instance delivery, database optimization, and backup management. Manual, memory‑based handling of dozens of DB instances is no longer feasible, prompting the need for batch management of backups, metadata, scheduled scripts, and rapid provisioning.
Database Standardization
All instances on a host reside under a unified directory, distinguished by port numbers (e.g., my3306, my3307), with separate data, log, and runtime sub‑directories.
Each instance has its own configuration file; only server_id and buffer_pool_size differ, while other parameters remain consistent.
Production MySQL software directories and versions are kept identical across hosts.
Phase 1: Backup Monitoring System
The first step focused on backups, the most critical task for DBAs. ZanDB’s backup monitoring system provides:
Real‑time overview of backup status, showing total instances to back up and completed ones.
Duration of each backup operation.
Statistics for the past five days, including total backup count and size.
Phase 2: Comprehensive Automation Platform
After the backup monitor, ZanDB expanded into a full‑featured automation platform divided into seven functional modules:
1. Task System
A robust scheduler supports daily, weekly, monthly, and interval‑based tasks. It consists of an agent that executes tasks on hosts and a central scheduler that stores task definitions and timing strategies, eliminating the need for host‑level crontab entries.
2. Backup Management
Integrated with the task system, it allows easy configuration of backup windows, target instances, and success/failure callbacks. Failed backups are logged with error details, enabling quick retries without manual host access. Daily verification of core database backups triggers alerts via WeChat or SMS when validation fails.
3. Host Management
Host metadata (disk size, free space, memory) is automatically fetched from Zabbix during host addition, providing a reliable inventory for subsequent operations.
4. Instance Management
Supports one‑to‑many host‑instance relationships and offers:
Viewing instance lists with data size, log size, replication status, slow‑query and killed‑SQL counts, and performance history.
Adding new instances by rsync‑ing a standard database template and rendering a my.cnf configuration file; the process is visible in the workflow system and supports retry on failure.
Automated master‑slave consistency checks for core databases to detect replication drift early.
Splitting multiple schemas from a single instance into separate instances.
Daily snapshots of instance metadata (slow queries, directory sizes) for historical analysis.
5. Log Management
The log system aggregates slow‑query and killed‑SQL data from agents, rotates slow‑query logs daily, and uses pt‑query‑digest to parse and cache results. It also tracks top killed‑SQL statements, enabling quick identification of problematic queries.
6. Metadata Management
Handles binlog metadata, primary‑key overflow checks, and sharding information. Binlog records include start/end timestamps and retention periods, facilitating rapid point‑in‑time recovery. Primary‑key overflow alerts prevent insert failures, and a sharding metadata query maps database name, shard ID, and shard count to the responsible instance.
7. Daily Maintenance
Agents execute batch SQL commands and configuration changes across selected instances. Only maintenance‑related DML is allowed; configuration updates persist both on‑disk and in memory (e.g., adjusting slow‑query thresholds).
Technology Stack and Future Outlook
ZanDB is built with Python Django, Percona‑Toolkit, a custom agent, and front‑end technologies, leveraging Redis for caching and MySQL as the backend store. Planned enhancements include automated performance diagnostics, intelligent slow‑query analysis, and automated sharding to further reduce manual DBA effort.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
