Design and Implementation of Ctrip's Cloud Desktop System Based on OpenStack
This article details Ctrip's deployment of a large‑scale virtual cloud desktop solution for its call center, covering the motivations, original OpenStack architecture, its limitations, the redesigned decoupled architecture, and the operational practices such as resource over‑commit, network tuning, monitoring, and automated testing that ensure stability and scalability.
Ctrip's Technical Center launched a cloud desktop service for its call center, replacing traditional PC workstations with thin‑client virtual desktops managed by OpenStack. The shift dramatically reduces IT operation costs, improves fault‑handling speed, and lowers power consumption.
The original architecture tightly integrated the cloud desktop platform with OpenStack Nova, using Keystone for authentication and Horizon for management, which caused strong coupling, limited flexibility, and required extensive testing for any OpenStack upgrade.
To overcome these issues, a new decoupled architecture was introduced: a VMPool and Allocator layer handle virtual machine provisioning independently of OpenStack, while a dedicated portal replaces Horizon for IT operations. User authentication is performed via LDAP, and allocation rules are applied based on user groups, OU, and tags.
Large‑scale deployment across six call‑center sites (Shanghai, Nantong, Rugao, Hefei, Xinyang, Muling) now supports hundreds of compute nodes and nearly ten thousand seats, with fault rates far lower than traditional PCs.
Key operational challenges addressed include:
Software version selection: careful testing of KVM, QEMU, Open vSwitch, kernel, and libvirt versions to ensure 7×24 stability.
Resource over‑commit: memory over‑commit limited to ~1:1.2, CPU to 1:2, and I/O throttling to avoid startup storms.
Network details: fixing multi‑instance DNSMasq DHCP renewal issues, correcting libvirt‑Open vSwitch start‑stop order to prevent VM network loss, and enabling RabbitMQ TCP keep‑alive to avoid lost messages.
Operational tools include two SaltStack systems for thin‑client and VM management, a custom portal for overall control, extensive business‑level monitoring (user activity, resource usage, OpenStack alerts), and a fully automated testing lab that simulates user input on thin clients 24/7.
These practices collectively ensure the cloud desktop platform remains reliable, scalable, and adaptable to evolving business needs.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.