Design and Implementation of Meituan's Self‑Developed CI/CD Pipeline Engine
After three years of development, Meituan replaced Jenkins with a self‑built, distributed CI/CD pipeline engine that unifies backend infrastructure, processes nearly 100,000 daily executions with over 99.99% success, using decoupled scheduling, label‑based resource pools, a layered component SDK, and supports multiple languages, while planning serverless extensions.
After nearly three years of construction and polishing, Meituan's pipeline engine has achieved unified backend infrastructure, supporting almost 100,000 pipeline executions per day with a system success rate above 99.99%.
Background : Continuous delivery was first proposed in 2006 and has become essential for many technology teams. Early on Meituan used Jenkins to quickly enable continuous delivery, but as business scale grew, the “quick‑setup” approach showed serious drawbacks: lack of unified standards, high construction cost, uneven quality, and performance limits of open‑source tools under heavy load.
Evolution Stages :
2014‑2015: Unified Jenkins clusters to solve common problems such as SSO, code‑repo integration, notifications, and dynamic agent scaling.
2016‑2018: Split multiple Jenkins clusters to alleviate single‑cluster bottlenecks, which later caused operational complexity and security issues.
2019‑present: Developed a self‑built distributed pipeline engine (internal project name Pipeline) to eliminate single‑machine bottlenecks and duplicate tooling.
The engine now serves all Meituan businesses (e.g., in‑store, delivery, Dazhong Dianping, Meituan Select, autonomous delivery vehicles, and the basic R&D platform) and supports Java, C++, NodeJS, Golang, etc.
Problem & Approach
Major Challenges
Scheduling efficiency bottleneck : Short‑lived jobs (seconds to minutes) are sensitive to scheduling latency; existing open‑source tools use a monolithic, serial scheduling model.
Resource allocation : Jobs far outnumber resources; dynamic scaling alone cannot guarantee timely execution, and pre‑deployed resources need balanced partitioning.
Tool heterogeneity : Diverse tools require a plugin‑style architecture that hides implementation differences from pipeline authors.
Solution Overview
Separate scheduling decision from resource allocation, allowing both modules to scale horizontally.
Introduce a resource‑pool management model with label‑based matching between jobs and pools.
Adopt a layered component design (business layer, system‑interaction layer, execution‑resource layer) to accommodate tool differences.
Overall Architecture
The engine consists of five core modules:
Trigger : Handles various trigger sources (PR, push, API, cron).
Task Center : Manages pipeline instances, job status, retries, and result reporting.
Decision Engine : Determines which waiting jobs can be scheduled and updates the Task Center.
Worker : Pulls scheduled jobs from the Task Center and assigns execution resources.
Component SDK : Provides a uniform interface for component developers.
Core Design Points
Job Scheduling Design
A typical pipeline (source checkout → parallel code‑scan & build → deploy) follows these steps:
Trigger creates component jobs in the Task Center and emits events.
Decision Engine evaluates jobs, marks them pending, and adds them to a waiting queue.
Task Center updates job status to pending and places them in the queue.
Workers long‑poll the queue, pull jobs, execute them, and report results.
Task Center updates status based on worker feedback and triggers the next decision round.
A state‑machine (unstart → pending → scheduled → finished/failed) is protected by optimistic‑lock DB updates and compensation mechanisms to handle lost or duplicated jobs.
Resource‑Pool Design
Resources are grouped into pools identified by labels. Jobs are assigned a label derived from two dimensions:
Component dimension : Groups resources needed by a specific component type (e.g., SSD, dev‑env).
Pipeline dimension : Reflects business‑level isolation requirements.
Each label maps 1‑to‑1 to a job queue, while a label can belong to multiple pools (many‑to‑many). This enables fine‑grained quota control, high‑priority weighting, and graceful degradation.
Component Layered Design
The component architecture is split into three layers:
Business layer : Provides adapters for diverse component logic.
System‑interaction layer : Defines a standard workflow (init(), run(), queryResult(), uploadArtifacts()) that all components must implement.
Execution‑resource layer : Abstracts different execution environments (container, VM, custom image).
Standardized methods are exposed as init(), run(), queryResult(), and uploadArtifacts(). Additional event types (e.g., pause, callback) allow extensions such as manual approval without breaking existing flows.
Future Plans
Leverage serverless and other cloud‑native technologies to achieve lighter, more elastic resource management.
Provide an end‑to‑end component development and operation platform to lower entry barriers for tool developers.
Authors
Geng Jie, Chun Hui, Zhi Yuan, and other members of Meituan's R&E Quality & Efficiency Platform team.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
