From Fire‑Fighting to Proactive Delivery: How Meizu Built a Cloud‑Native CI/CD Ops Platform
Meizu’s operations team transformed reactive firefighting into proactive delivery by building a cloud‑native continuous integration platform, detailing their automation journey, challenges, platform components, release evolution, and intelligent ops that together boost quality, efficiency, cost control, and security.
Introduction
Operations value is not about taking the blame, filling holes, or firefighting; proactive response to change and risk is a crucial capability. Meizu’s ops team built a continuous‑integration cloud delivery platform to improve adaptability and provide efficient delivery experiences for users and product teams.
Automation Construction Timeline
2003‑2008 (Internet 1.0) : Services limited to website and BBS, using PHP + MySQL.
2009‑2011 (Internet 2.0) : Introduced LVS architecture, master‑slave DB design, but all services ran in a single IDC.
2012‑2013 (Internet 2.5) : Added application center, multimedia, O2O; implemented sharding, routing, Redis cluster, MooseFS, and various MQ services.
2014 onward (Internet 3.0) : Internet business became a core revenue stream.
Challenges Brought by Growth
Four dimensions were analyzed:
Quality : Measured by availability metrics (direct monitoring of network, services, applications, systems; indirect metrics such as response speed and SMS delivery rate). Early on, monitoring coverage was low and noisy, leading to mistrust.
Efficiency : Delivery and change processes were frequent but not integrated with automation, resulting in low overall efficiency.
Cost : Lack of transparent capacity planning caused “filling holes”, “firefighting”, and “taking the blame” to become routine.
Security : Early security policies were established; later a comprehensive security system covered system, data, and application layers.
Current Ops Platform
The platform consists of several subsystems:
Resource Management : Built a cloud platform with KVM + Docker, managed compute and network resources via CMDB.
Configuration Management : Managed LVS, CDN, DNS and exposed fine‑grained APIs for permission control.
Automation System : Included ticketing, logging, release channels, and self‑developed ops pipelines with automatic inspection.
Monitoring & Capacity : Provided basic, custom, business, and capacity monitoring to evaluate resource needs and control costs.
Security System : Access through a bastion host, self‑developed WAF, vulnerability management, and automated patch tracking.
Release Platform Evolution
The release process moved from weekly to daily and finally to self‑service releases. Manual operations were replaced by automation tools that dispatch commands and scripts to servers. Integration of CMDB business trees and defined release standards raised success rates above 98% and enabled over 90% of releases without ops involvement.
Delivery Pipeline
Three environments—development, testing, production—are used. Code is built with Jenkins, deployed via Redmine, and automated deployments provide logging, alerting, and rapid scaling. A balanced technical environment and stable framework owners are essential to maintain documentation and knowledge continuity.
Value Framework
The cloud platform must automate environment provisioning to ensure standardized deliveries. A unified development framework, driven by a technical committee, guarantees a consistent tech stack and automated processes. Core principles of the delivery pipeline include standardization, automation, and repeatability, covering parallel development, compilation, unit testing, system and integration testing, rollback, and production monitoring.
Standardization, Automation, and Intelligence
Automation is divided into three stages: standardization (hardware, components, tech stack, monitoring), automation (unit tests, coverage, admission criteria), and intelligence (data‑driven learning and prediction). Two technical options were considered: a full open‑source stack (Docker + Elasticsearch) versus extending existing platforms. The latter was chosen to minimize disruption.
Key practices include a unified entry point that calls Jenkins APIs, synchronizing bug information with Redmine, and consolidating user information across development, testing, and ops roles.
Continuous Integration Process
Requirement Phase : Product owners submit requirements; development leads analyze and schedule delivery.
Development Phase : Code writing, committing, building, static scanning, and coverage analysis.
Testing Phase : Deploy test environment, run automated security and performance tests, perform manual verification, and return to development if criteria are not met.
Release Phase : Audit, gray‑environment deployment, additional automated tests, and final production release.
Release Procedure
Environment check (user directories, permissions).
Fetch artifacts from the packaging platform.
Temporarily disable monitoring to avoid false alarms.
Take the web service offline.
Stop the service to release file locks.
Update files.
Start the service.
Perform monitoring checks to verify availability.
Bring the web service back online via LVS.
Re‑enable monitoring.
This fine‑grained process ensures high success rates while supporting parallel or sequential releases as needed.
Intelligent Ops
By collecting operational data, the team can learn patterns and predict failures—e.g., high disk replacement rates signal imminent disk failures, and key switch error rates can forecast data‑center outages—enabling proactive maintenance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
