Scaling Ops Automation on Alibaba Cloud: From Scripts to Ansible & API Gateways
This article recounts how a fintech platform migrated from manual upgrade scripts to a fully automated operations workflow using Rundeck, Ansible, DNS, API gateways, and a custom backup tool, dramatically improving deployment speed, reducing downtime, and sharing open‑source utilities for the community.
Background
Hope Finance is an internet‑based financial platform for agriculture, fully hosted on Alibaba Cloud (ECS, RDS, SLB). Rapid growth in 2017 (transaction volume > 7 billion CNY) required more efficient, stable, and secure operational practices.
Evolution of Operations
Initial process used jump‑server scripts for upgrades, taking about one hour per component and causing service downtime. Load‑balancer (SLB) weight adjustments were performed manually via the Alibaba Cloud console, leading to repetitive work and errors.
To automate weight changes, a script calling Alibaba Cloud APIs was written. As the system scaled, this approach became insufficient, prompting three major architectural changes:
Introduce a DNS layer and migrate scripts to Rundeck . Operators could now trigger code deployment, SLB weight updates, and DNS changes through a web UI, eliminating manual console operations.
Add a gray‑release SLB. A separate SLB instance served as a testing entry point, enabling near‑zero‑downtime releases.
Deploy an API gateway on top of Rundeck and integrate Ansible . The gateway provided fine‑grained start/stop control for services, simplifying upgrades and improving high‑availability.
Resulting efficiency improved from one‑hour, downtime‑prone upgrades to upgrading five systems within two hours with minimal interruption.
Automation Attempts
1. Ansible Applications
Run inspection playbooks that detect abnormal conditions (e.g., unexpected open ports, modified critical files) and send alerts.
Automate server initialization: install Nginx/OpenResty, Tomcat, Node.js, and configure system parameters.
Batch upgrade components, apply patches, and invoke Rundeck jobs for coordinated execution.
2. Custom Service Components
Extended OpenResty (Nginx) with useful third‑party plugins.
Added JMX modules to Tomcat for performance monitoring.
3. Operating‑System Customization
Integrated Zabbix and Splunk Forwarder, embedded Ansible SSH keys, and built a custom image.
Reduced server provisioning time from one hour to five minutes, enabling rapid scaling during emergencies.
4. Enterprise WeChat Tools
Developed lightweight WeChat applications for alert notifications, information queries, data reports, and a “one‑stop” operations platform.
Open‑Source Components
Splunk‑WeChat alert script for integrating Splunk alerts with Enterprise WeChat.
Python script that works around bugs in the SOAP‑based NamedManager API.
Zabbix scripts and templates for monitoring Alibaba Cloud RDS instances.
All components are available at https://github.com/XWJR-Ops
ModuleAB Backup Tool
ModuleAB is a backup solution built on Alibaba OSS and OAS, consisting of a central Server and distributed Agents.
Agents monitor designated directories via inotify. When a file is closed, the Agent compresses it, uploads the archive to OSS, and reports the backup to the Server.
The Server can apply lifecycle policies to move older objects from OSS to OAS for long‑term storage and retrieve them on demand.
ModuleAB is in public beta; the project homepage is https://github.com/ModuleAB/ModuleAB
Conclusion
The team progressed from using off‑the‑shelf tools, to customizing them, and finally to developing proprietary solutions such as ModuleAB. System architecture changes and automation tooling (Rundeck, Ansible, API gateway) are tightly coupled: each architectural shift required corresponding automation updates, and improved automation in turn enabled more stable and scalable architectures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
