Operations 16 min read

From Fire‑Fighting to Proactive Delivery: How Meizu Built a Cloud‑Native CI/CD Ops Platform

Meizu’s operations team transformed reactive firefighting into proactive delivery by building a cloud‑native continuous integration platform, detailing their automation journey, challenges, platform components, release evolution, and intelligent ops that together boost quality, efficiency, cost control, and security.

Efficient Ops
Efficient Ops
Efficient Ops
From Fire‑Fighting to Proactive Delivery: How Meizu Built a Cloud‑Native CI/CD Ops Platform

Introduction

Operations value is not about taking the blame, filling holes, or firefighting; proactive response to change and risk is a crucial capability. Meizu’s ops team built a continuous‑integration cloud delivery platform to improve adaptability and provide efficient delivery experiences for users and product teams.

Automation Construction Timeline

2003‑2008 (Internet 1.0) : Services limited to website and BBS, using PHP + MySQL.

2009‑2011 (Internet 2.0) : Introduced LVS architecture, master‑slave DB design, but all services ran in a single IDC.

2012‑2013 (Internet 2.5) : Added application center, multimedia, O2O; implemented sharding, routing, Redis cluster, MooseFS, and various MQ services.

2014 onward (Internet 3.0) : Internet business became a core revenue stream.

Challenges Brought by Growth

Four dimensions were analyzed:

Quality : Measured by availability metrics (direct monitoring of network, services, applications, systems; indirect metrics such as response speed and SMS delivery rate). Early on, monitoring coverage was low and noisy, leading to mistrust.

Efficiency : Delivery and change processes were frequent but not integrated with automation, resulting in low overall efficiency.

Cost : Lack of transparent capacity planning caused “filling holes”, “firefighting”, and “taking the blame” to become routine.

Security : Early security policies were established; later a comprehensive security system covered system, data, and application layers.

Current Ops Platform

The platform consists of several subsystems:

Resource Management : Built a cloud platform with KVM + Docker, managed compute and network resources via CMDB.

Configuration Management : Managed LVS, CDN, DNS and exposed fine‑grained APIs for permission control.

Automation System : Included ticketing, logging, release channels, and self‑developed ops pipelines with automatic inspection.

Monitoring & Capacity : Provided basic, custom, business, and capacity monitoring to evaluate resource needs and control costs.

Security System : Access through a bastion host, self‑developed WAF, vulnerability management, and automated patch tracking.

Release Platform Evolution

The release process moved from weekly to daily and finally to self‑service releases. Manual operations were replaced by automation tools that dispatch commands and scripts to servers. Integration of CMDB business trees and defined release standards raised success rates above 98% and enabled over 90% of releases without ops involvement.

Delivery Pipeline

Three environments—development, testing, production—are used. Code is built with Jenkins, deployed via Redmine, and automated deployments provide logging, alerting, and rapid scaling. A balanced technical environment and stable framework owners are essential to maintain documentation and knowledge continuity.

Value Framework

The cloud platform must automate environment provisioning to ensure standardized deliveries. A unified development framework, driven by a technical committee, guarantees a consistent tech stack and automated processes. Core principles of the delivery pipeline include standardization, automation, and repeatability, covering parallel development, compilation, unit testing, system and integration testing, rollback, and production monitoring.

Standardization, Automation, and Intelligence

Automation is divided into three stages: standardization (hardware, components, tech stack, monitoring), automation (unit tests, coverage, admission criteria), and intelligence (data‑driven learning and prediction). Two technical options were considered: a full open‑source stack (Docker + Elasticsearch) versus extending existing platforms. The latter was chosen to minimize disruption.

Key practices include a unified entry point that calls Jenkins APIs, synchronizing bug information with Redmine, and consolidating user information across development, testing, and ops roles.

Continuous Integration Process

Requirement Phase : Product owners submit requirements; development leads analyze and schedule delivery.

Development Phase : Code writing, committing, building, static scanning, and coverage analysis.

Testing Phase : Deploy test environment, run automated security and performance tests, perform manual verification, and return to development if criteria are not met.

Release Phase : Audit, gray‑environment deployment, additional automated tests, and final production release.

Release Procedure

Environment check (user directories, permissions).

Fetch artifacts from the packaging platform.

Temporarily disable monitoring to avoid false alarms.

Take the web service offline.

Stop the service to release file locks.

Update files.

Start the service.

Perform monitoring checks to verify availability.

Bring the web service back online via LVS.

Re‑enable monitoring.

This fine‑grained process ensures high success rates while supporting parallel or sequential releases as needed.

Intelligent Ops

By collecting operational data, the team can learn patterns and predict failures—e.g., high disk replacement rates signal imminent disk failures, and key switch error rates can forecast data‑center outages—enabling proactive maintenance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AutomationOperationsplatformcontinuous integrationcloud delivery
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.