Ctrip Container Cloud Operations: Practices, Challenges, and Future Outlook
This article presents Ctrip's experience in building and operating a private container cloud platform, detailing its architectural evolution, operational challenges, tooling, monitoring, capacity management, and future directions toward hybrid and cloud‑native environments.
Ctrip operates a hybrid private cloud platform that combines self‑built data centers with public cloud resources to support thousands of applications across ticketing, hotel, and travel services. The platform enables rapid feature iteration, on‑demand scaling, and processes over 10,000 weekly changes.
The container cloud evolved through three stages: an initial OpenStack‑based implementation (2013‑2015), a custom Mesos‑based scheduler (2016), and a full migration to Kubernetes (2017‑2018). The shift to Kubernetes introduced a PaaS model where users request services rather than individual machines.
Operational challenges include exponential growth of IP addresses requiring SDN isolation, CPU and network resource isolation for heterogeneous workloads, handling defunct processes, and managing frequent Docker version upgrades. Ctrip addresses these with a unified CDOS layer that abstracts compute, network, and storage resources.
Key operational tools comprise SaltStack and Rundeck for configuration management, a custom Ctrip‑Hickwall monitoring system (built on Prometheus), ELK/TIGK/ElasticBeats for logging, and StackStorm for ChatOps automation. These enable real‑time alerting, centralized log aggregation, and automated incident response via chat bots.
Processes focus on continuous monitoring of platform changes, trend analysis, and capacity planning. Ctrip monitors metrics such as host load, container counts, and image service health across multiple data centers, using dashboards to detect resource saturation and guide scaling decisions.
Capacity management leverages monitoring data, Hadoop analytics, and PaaS scheduling to predict resource usage, optimize host utilization, and control costs. The goal is to achieve elastic computing while maintaining stability.
Looking ahead, Ctrip plans to integrate public‑cloud services (Alibaba Cloud, AWS) into a unified hybrid‑cloud management framework, continue advancing Kubernetes adoption, and further develop cloud‑native DevOps practices.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.