Ctrip’s Experience with Windows Containers: Architecture, Migration, Storage, Networking, and Orchestration
This article details Ctrip’s practical investigation of Windows containers, explaining why they adopted them for .NET workloads, the migration process, container types, image handling, storage and network models, orchestration choices such as Docker, Mesos, and the remaining challenges and future plans.
Ctrip’s cloud platform team began a systematic study of Windows containers in the second half of 2023 to address the massive .NET ecosystem that runs over 3,000 core applications across more than 20 business units.
Why Windows containers? The legacy .NET services run on virtual machines, which have coarse-grained resource isolation, long provisioning cycles, and slow scaling. Windows containers promise finer-grained isolation, faster (second‑level) deployments, and a unified model that aligns with Linux containers for storage, networking, and orchestration.
Container types explored include Windows Server containers and Hyper‑V containers. Server containers share the host kernel and allow direct process management, while Hyper‑V containers run in a lightweight VM, offering stronger isolation at a modest performance cost. Both are built on Windows Server 2016 (Server Core or Nano Server).
Image construction follows the Docker workflow: Dockerfiles are used to build layered images, which are then pushed to a private Harbor registry integrated with Ctrip’s Active Directory. Base images for Windows Server are large (≈8 GB) and require careful bandwidth management.
Storage solutions comprise three layers: immutable image storage, Docker volumes for persistent data, and SMB‑based network storage for shared datasets. Volume mapping and symbolic links are employed to emulate legacy D‑drive requirements.
Networking models include NAT (simple, job‑type workloads), Transparent (production‑grade, MAC‑address spoofing), L2 bridge (flat OpenStack‑style), and Tunnel mode (Azure‑style). Hyper‑V hosts also support embedded‑team networking with VLAN tagging to achieve multi‑tenant isolation.
Orchestration is handled primarily with Docker Compose for single‑host scenarios, Docker Swarm for lightweight clustering, and Mesos + Marathon for a unified scheduler that can manage both Linux and Windows containers. The team is adapting Mesos to run on Windows Server by recompiling the binaries.
Migration outcomes show consistent build environments, a 50 % reduction in host resources, and a drop in build time from several minutes to roughly 90 seconds per job. However, challenges remain: lack of GUI support, limited RDP capability (mitigated by installing SSHD), D‑drive emulation, and large image distribution bandwidth.
Open issues include optimizing image layer distribution, establishing robust monitoring and logging for Windows containers, and scaling management for thousands of containers in production.
The presentation concludes with a call for community feedback and future work to achieve seamless, second‑level deployments across development, testing, and production environments.
DevOps
Share premium content and events on trends, applications, and practices in development efficiency, AI and related technologies. The IDCF International DevOps Coach Federation trains end‑to‑end development‑efficiency talent, linking high‑performance organizations and individuals to achieve excellence.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.