From Xiaomi to a Trading Exchange: Real‑World Automation Ops Case Studies
This article presents two practical automation operations case studies—Xiaomi's three‑year journey to platform‑managed, self‑scheduling services and a trading exchange's step‑by‑step build from zero automation—highlighting standards, tooling, and cultural challenges for modern ops teams.
This article compiles a lively discussion from the Efficient Operations WeChat group, originally posted on the "Efficient Operations" public account.
Case 1: Xiaomi’s Automated Operations Development Process
The diagram shows Xiaomi’s operations evolution over roughly three years, moving from a nascent team to a platform‑management stage, now approaching system self‑scheduling.
Automation work focuses on four areas, each requiring clear standards and norms:
Business deployment and change
Monitoring management and change
Capacity scaling
Fault handling
Deployment
Strict, extensible deployment standards were established early. An automated deployment system now handles about 90% of service releases, allowing developers to perform many changes themselves while ops assist with core services, greatly reducing manual effort.
Monitoring
Previously monitoring required extensive manual effort, especially after capacity changes. Xiaomi built the open‑source Falcon monitoring system (http://open-falcon.com), which requires a single configuration and eliminates the need for repeated monitoring updates.
Capacity
Capacity scaling remains a challenge; although an automated release system exists, further potential for improvement is recognized.
Fault Handling
Fault‑handling processes are still being refined, with plans to automate typical incident resolutions.
New Journey: System Self‑Scheduling
Building on established standards, Xiaomi now integrates Mesos, Marathon, Docker, and internal systems (CMDB, naming, LVS) to achieve automatic service instance scaling and resource scheduling.
Case 2: A Trading Exchange’s Zero‑Automation to Automation Journey
Environment Overview
The exchange operates hundreds of physical machines and thousands of virtual machines—small compared to large internet firms.
Zero Automation
Initially, operations relied on fragmented scripts with no central management. The team began with host, network, and database configuration management to adopt open‑source tools, despite limited internal expertise.
Open‑source adoption required learning and occasional re‑development, but provided a manageable entry point before moving to full application deployment automation.
Promoting automation faced cultural resistance: concerns about maturity, habit of manual command‑line work, and belief that a small environment does not need automation.
How should a small‑scale, zero‑automation team start automating?
Clarify responsibilities (dev, ops, monitoring, deployment) even if initially handled by one or two people.
Select technology stack (containers vs. VMs) and ensure the team has the required skills.
Implement version and process control to enable rapid iteration.
Archive manual deployment scripts; the team currently uses git + Jenkins + Docker + Python for automated builds and on‑demand developer environments.
Key Takeaways
Automation is a concept like cloud computing; focus on the efficiency requirements that drive operational needs.
Strive for extreme efficiency—true automation emerges when concepts don’t dictate actions.
Even without formal automation, improving processes, reducing errors, and increasing efficiency can make a team “more automated” than those shouting about it.
Never let automation replace system understanding; maintain control to avoid hidden pitfalls.
Building a Collaborative Future
The Efficient Operations community, founded in April 2015, now includes over 800 members, with more than 300 senior ops leaders, fostering knowledge sharing and continuous improvement.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.