Cloud Native 19 min read

Ctrip’s Continuous Delivery Practices and Unified Build Platform with Jenkins on Kubernetes

This article describes Ctrip’s large‑scale continuous delivery system, its benefits for efficiency, quality, reliability and team collaboration, the evolution of its deployment models, the design of a unified Jenkins‑based build platform, and practical experiences running Jenkins on Kubernetes with elastic scheduling and workspace management.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Ctrip’s Continuous Delivery Practices and Unified Build Platform with Jenkins on Kubernetes

Ctrip operates over 8,000 applications with more than 3,000 developers, performing over 6,000 deployments daily. Their continuous delivery (CD) aims to improve efficiency, ensure quality through integrated code scanning and testing, enhance reliability by reducing manual steps, foster tighter team collaboration, and increase process transparency via unified tools and standards.

The CD workflow starts with developers pushing code, followed by automated scanning, unit and integration tests, version creation, packaging, and deployment to test environments. Successful tests trigger notifications to automated testing platforms, after which QA approves promotion to subsequent environments, ultimately reaching production.

Branch management relies on a master branch and multiple feature branches, with temporary integration branches to surface conflicts early. Hotfix branches are merged back into master and the temporary integration branch when urgent bugs arise.

Versioning goes beyond simple Git tags; Ctrip packages source code into explicit build artifacts to capture both code and dependency states, including environment dependencies handled via containers.

Deployment model evolution progressed from VM‑based single‑machine multi‑application deployments, to single‑machine single‑application releases, then to “fat containers” (VM‑like containers) and finally to a container‑centric approach. Key concepts include:

Group – a set of machines exposing the same service.

Pull‑in / Pull‑out – controlling traffic flow to group members.

Bastion – the first validated machine in production, similar to a canary.

Ignition – application initialization and warm‑up, marking successful deployment.

Batch – rolling deployment in multiple batches.

Downgrade – bypassing traffic control to ensure deployment success during failures.

Brake – halting deployment when failure rates exceed a threshold.

Rollback – reverting to a previous stable version.

The unified build platform is built around Jenkins. While Jenkins offers a convenient WAR‑based deployment and rich plugin ecosystem, Ctrip addressed its single‑point‑failure and scaling limitations by adopting master‑backup setups, Keepalived virtual IP for HA, and eventually distributing Jenkins masters across Mesos/Kubernetes or using CloudBees.

Scaling is managed by splitting Jenkins masters based on environment, organization, product line, plugin scope, and access control. Monitoring spans OS metrics, application‑level health (API, worker, Jenkins), and business‑logic indicators such as queue congestion and capacity thresholds. Additional checks include pipeline step timeouts and container scheduling delays.

Running Jenkins on Kubernetes introduced elastic slave scheduling. Important parameters include initialDelay (set to 0 when no static slaves exist), decay (smoothing factor for EMA‑based load calculation), and a custom inequality involving m and totalSnapshot to decide slave creation. Observations showed average slave spin‑up time around 20 seconds, with occasional longer delays caused by label‑refresh intervals, which were mitigated by explicitly resetting labels.

Workspace retention is handled by sharing the same node between master and slave, allowing reuse of downloaded sources and facilitating debugging. To avoid concurrent job execution on multiple masters, a higher‑level scheduler enforces affinity to the master that previously ran the job.

StatefulSet is used to manage Jenkins master pods, ensuring stable pod‑to‑node mapping and persistent volumes via a custom CHostpath Volume Driver. This setup simplifies master creation, updates, and scaling while keeping configuration under version control.

Remaining challenges include multi‑environment image duplication—each environment currently builds its own image, leading to inconsistencies—and resource mixing, where build infrastructure could be shared with other workloads during off‑peak hours to improve data‑center utilization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud Nativeci/cdOperationsKubernetesDevOpsContinuous DeliveryJenkins
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.