How ZTO’s Ark Platform Automates Database Operations and Boosts Efficiency
This article introduces ZTO Technology’s Ark Platform, a database PaaS solution that automates deployment, backup, monitoring, data flashback, and slow‑SQL optimization, detailing its architecture, core features, and the operational benefits it brings to large‑scale logistics environments.
Introduction
With the rapid growth of the logistics industry, ZTO Express handled 17 billion parcels in 2020, achieving a 20.4% market share. To sustain this scale, ZTO’s technology team (over 1,000 engineers) adopted the motto “Technology empowers logistics,” emphasizing full‑stack self‑development and the creation of an "Internet + Logistics" leader.
The company now manages more than 3,000 databases of various types (MySQL, MongoDB, TiDB, ES, etc.), with daily data increments exceeding 10 TB. To address the challenges of automation, platformization, data‑driven and intelligent management, ZTO built the Ark Platform (a database management PaaS) and the IDB change‑management platform.
Ark Platform Overview
1. Platform Design
The Ark Platform is a database automation operations platform launched in 2019, now in its 30+ version iterations. It covers 100% of online databases, achieving 90% automation of DBA daily tasks. The system is divided into five layers: foundation, service, data, business, and monitoring, ensuring flexibility, scalability, controllability, and security.
Backend is built with Python, frontend with Vue, using Django, DRF, Nginx, SNMP, Prometheus, ELK, and custom scripts for monitoring.
2. Feature Overview
The platform provides seven modules with more than 15 functions, focusing on database operation automation, self‑service, intelligence, and service‑orientation.
Monitoring Center
The monitoring center replaces manual DBA inspections, offering proactive alerts, health checks (connections, threads, disk usage, TPS, QPS), InnoDB metrics, replication lag, backup status, and parameter monitoring.
Data flashback enables rapid rollback of erroneous production data by replaying binlog events. The system acts as a slave to pull binlog streams, parses binary events, and generates rollback scripts that can be executed remotely.
Slow‑SQL handling captures slow queries, aggregates statistics (total count, avg QPS, avg latency, unused index count), and presents SQL templates. Users can filter by latency, execution count, or unused indexes to identify high‑value optimization targets, then track progress through a dedicated slow‑SQL report.
Conclusion
The Ark Platform represents a long‑term, iterative effort to integrate database operations with business development, continuously improving efficiency. Future work will extend the platform across the full lifecycle—from release, SQL review, fault recovery, to performance optimization—deeply aligning platform capabilities with business scenarios.
Zhongtong Tech
Integrating industry and information for digital efficiency, advancing Zhongtong Express's high-quality development through digitalization. This is the public channel of Zhongtong's tech team, delivering internal tech insights, product news, job openings, and event updates. Stay tuned!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.