Building a Custom Kubernetes Cluster on AWS for Hybrid‑Cloud Container Deployment at Ctrip
The article describes Ctrip's motivation, design choices, and implementation details for deploying a self‑managed Kubernetes cluster on AWS, covering network architecture with VPC and ENI, custom image distribution, logging and monitoring, data synchronization, and operational challenges such as ELB loopback and kubelet configuration.
Introduction – As Ctrip expands internationally, its hybrid‑cloud platform needed a faster, more efficient container deployment solution. Deploying Kubernetes on AWS reduced VM provisioning time from minutes to seconds and simplified operations across multiple public‑cloud providers.
Why Build Kubernetes on AWS Instead of Using Managed Services – Ctrip required IP‑direct connectivity and fixed IP addresses for containers, which native Kubernetes services (Service, Ingress) and existing CNI plugins (flannel, calico) could not fully satisfy. A custom solution also preserved control‑plane management and alignment with the private‑cloud IDC stack, leading to the decision to build its own cluster rather than use AWS EKS.
Network Design – The network is based on VPC and Elastic Network Interfaces (ENI). Each ENI provides a primary private IP, optional secondary IPs, a MAC address, and security‑group bindings. Two sub‑solutions were evaluated:
Single NIC with multiple IPs – high pod density but requires complex scheduling and security‑group handling.
Single NIC with a single IP per ENI – simpler, isolates each pod on its own ENI, easier to manage, though lower pod density.
Ctrip selected the single‑NIC‑single‑IP approach for its simplicity and sufficient resource utilization for Java workloads. A global IP address management module (GIPAM) stores a one‑to‑one mapping between pod names (StatefulSets) and IPs, ensuring IP stability across redeployments.
Image Management – Each private‑cloud IDC runs a Harbor instance forming a federated Harbor cluster. DNS hijacking directs Docker pushes to the local IDC Harbor, which asynchronously syncs to other IDC Harbors. In AWS, a dedicated Harbor backed by S3 stores images; DNS hijacking ensures the same image domain is used across environments.
Logging and Monitoring – The public‑cloud side mirrors the private‑cloud stack using Prometheus, Telegraf, InfluxDB, Grafana, and Hickwall. Metrics and alerts are processed in‑cloud and integrated with the NOC for 24/7 monitoring.
Data Synchronization from Private to Public Cloud – Large data transfers use public‑internet paths to avoid costly dedicated lines. Services are placed in public subnets with public IPs to bypass NAT‑gateway data‑processing fees. The architecture shows data collection agents in public subnets pushing data to AWS.
Operational Issues Encountered
AWS ELB registered by instance ID does not support loopback routing, causing API‑server timeouts; the issue is resolved by registering targets by IP address.
kubelet max‑pods must be tuned to respect the limited number of ENIs per instance.
Conclusion – The article outlines the background, requirements, and detailed design of Ctrip's self‑managed Kubernetes on AWS, covering network, image distribution, monitoring, data sync, and lessons learned, and invites further community exchange.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
