Operations 11 min read

Automating Patch Management for 10,000+ Servers with StackStorm & SaltStack

This article explains how Ctrip’s senior engineer Hu Junya built an automated operations platform using SaltStack for remote control, StackStorm for workflow orchestration, and a custom Jobs tool for batch gray releases, enabling safe, scalable patch deployment across thousands of servers.

Efficient Ops
Efficient Ops
Efficient Ops
Automating Patch Management for 10,000+ Servers with StackStorm & SaltStack

Speaker Hu Junya, senior technical support engineer at Ctrip, responsible for SaltStack, StackStorm and other operations platforms.

Topic: an automation platform based on StackStorm for Ctrip’s operations.

The 2021 ransomware outbreak highlighted the need for rapid, automated patching to protect business services.

When a company manages thousands of servers, manual patch updates are impossible; a coordinated, low‑impact, automated approach is required.

Typical single‑server patch flow: check if the patch is installed, if not pull the server out of the production cluster, install the patch, reboot, and optionally warm‑up the application before re‑joining the cluster.

Automation must address two aspects: (1) orchestrate the entire workflow, and (2) perform remote operations without logging into each server.

Remote Control

SaltStack is an open‑source remote management platform with a master‑minion architecture. The master issues tasks; minions execute them and return results. Similar tools include Ansible, Chef, and Puppet.

Workflow Orchestration

Traditional operations rely on manual steps or ad‑hoc scripts, leading to low efficiency, error‑prone processes, duplicated code, and missing audit logs.

DevOps introduces a plethora of open‑source tools and custom utilities, but challenges remain: complex changes require many tools, duplicated effort, unclear logic, and lack of unified logs.

StackStorm, an event‑driven automation platform, solves these problems by turning tool APIs into actions, composing actions into workflows, providing a visual UI, and centralizing logs.

Users can trigger actions via the web UI or API, and integrate with ChatOps for rapid incident response.

Batch Gray Release

For operations affecting tens of thousands of servers, the custom Jobs tool adds batch‑gray capabilities on top of StackStorm.

Automatically split targets into configurable batches (e.g., 1%, 5%, 10%).

Delegate actual execution to StackStorm actions, keeping Jobs lightweight.

Collect and display success/failure statistics for each batch.

Conclusion

To build a robust automation platform: deploy a remote‑management framework (e.g., SaltStack or Ansible), implement atomic operations as StackStorm actions, compose them into workflows, and use a batch‑gray system like Jobs for large‑scale tasks.

devopspatch managementStackStormoperations automationSaltStackbatch deployment
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.