Operations 9 min read

From Xiaomi to a Trading Exchange: Real‑World Automation Ops Case Studies

This article presents two practical automation operations case studies—Xiaomi's three‑year journey to platform‑managed, self‑scheduling services and a trading exchange's step‑by‑step build from zero automation—highlighting standards, tooling, and cultural challenges for modern ops teams.

Efficient Ops
Efficient Ops
Efficient Ops
From Xiaomi to a Trading Exchange: Real‑World Automation Ops Case Studies

This article compiles a lively discussion from the Efficient Operations WeChat group, originally posted on the "Efficient Operations" public account.

Case 1: Xiaomi’s Automated Operations Development Process

The diagram shows Xiaomi’s operations evolution over roughly three years, moving from a nascent team to a platform‑management stage, now approaching system self‑scheduling.

Automation work focuses on four areas, each requiring clear standards and norms:

Business deployment and change

Monitoring management and change

Capacity scaling

Fault handling

Deployment

Strict, extensible deployment standards were established early. An automated deployment system now handles about 90% of service releases, allowing developers to perform many changes themselves while ops assist with core services, greatly reducing manual effort.

Monitoring

Previously monitoring required extensive manual effort, especially after capacity changes. Xiaomi built the open‑source Falcon monitoring system (http://open-falcon.com), which requires a single configuration and eliminates the need for repeated monitoring updates.

Capacity

Capacity scaling remains a challenge; although an automated release system exists, further potential for improvement is recognized.

Fault Handling

Fault‑handling processes are still being refined, with plans to automate typical incident resolutions.

New Journey: System Self‑Scheduling

Building on established standards, Xiaomi now integrates Mesos, Marathon, Docker, and internal systems (CMDB, naming, LVS) to achieve automatic service instance scaling and resource scheduling.

Case 2: A Trading Exchange’s Zero‑Automation to Automation Journey

Environment Overview

The exchange operates hundreds of physical machines and thousands of virtual machines—small compared to large internet firms.

Zero Automation

Initially, operations relied on fragmented scripts with no central management. The team began with host, network, and database configuration management to adopt open‑source tools, despite limited internal expertise.

Open‑source adoption required learning and occasional re‑development, but provided a manageable entry point before moving to full application deployment automation.

Promoting automation faced cultural resistance: concerns about maturity, habit of manual command‑line work, and belief that a small environment does not need automation.

How should a small‑scale, zero‑automation team start automating?

Clarify responsibilities (dev, ops, monitoring, deployment) even if initially handled by one or two people.

Select technology stack (containers vs. VMs) and ensure the team has the required skills.

Implement version and process control to enable rapid iteration.

Archive manual deployment scripts; the team currently uses git + Jenkins + Docker + Python for automated builds and on‑demand developer environments.

Key Takeaways

Automation is a concept like cloud computing; focus on the efficiency requirements that drive operational needs.

Strive for extreme efficiency—true automation emerges when concepts don’t dictate actions.

Even without formal automation, improving processes, reducing errors, and increasing efficiency can make a team “more automated” than those shouting about it.

Never let automation replace system understanding; maintain control to avoid hidden pitfalls.

Building a Collaborative Future

The Efficient Operations community, founded in April 2015, now includes over 800 members, with more than 300 senior ops leaders, fostering knowledge sharing and continuous improvement.

monitoringAutomationoperationsdeploymentdevopscapacity scaling
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.