Cloud Native 13 min read

Ctrip International Ticketing Cloud‑Native Migration: Infrastructure as Code, Logging, Monitoring, and Cost Optimization

This article shares Ctrip International Ticketing’s cloud‑native migration experience, covering infrastructure‑as‑code with Terraform, managed Kubernetes, centralized logging and monitoring using Elastic Search, Prometheus, Grafana and Thanos, and practical cost‑optimization techniques such as auto‑scaling, spot instances, storage tiering and network proxying.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Ctrip International Ticketing Cloud‑Native Migration: Infrastructure as Code, Logging, Monitoring, and Cost Optimization

Background – To support overseas users, Ctrip International Ticketing sources data from global suppliers and runs services in many regions. Public cloud was chosen over building private data centers for flexibility and cost.

Cloud‑Native Adoption – The team follows established cloud‑native standards to build scalable, highly available, loosely coupled applications, focusing on rapid, low‑cost service delivery.

2.1 Infrastructure as Code – All infrastructure is defined in version‑controlled IaC repositories alongside application code. Terraform is used for declarative provisioning of managed Kubernetes clusters and other resources, enabling reproducible environments and CI/CD integration.

2.2 Logging – A managed Elastic Search service is used for log storage. Logs are collected via a DaemonSet on each node, forwarded to stdout/stderr by applications, and then processed and visualized in Kibana, decoupling logging from business code.

2.3 Monitoring – The monitoring stack consists of Prometheus + Grafana, deployed with the Prometheus Operator and integrated with Thanos for high‑availability, long‑term storage, and multi‑cluster aggregation. Thanos Sidecar uploads data to S3, while Thanos Compact downsamples old data to reduce storage costs.

3.1 Compute Cost Optimization – Elastic scaling is achieved with Kubernetes HPA and Cluster Autoscaler, automatically adjusting pod replicas and node counts based on load. Spot (bid) instances are mixed with on‑demand instances, using node affinity to run tolerant workloads on cheaper spot capacity while keeping critical services on stable nodes.

3.2 Storage Cost Optimization – Historical log and monitoring data are periodically snapshotted and moved to low‑cost object storage. Serverless functions (e.g., AWS Lambda) run lightweight backup scripts, charging only for execution time.

3.3 Network Cost Optimization – For outbound‑heavy ticket queries, a transparent Squid proxy is deployed in a private subnet to route external traffic, allowing the use of outbound‑only pricing models.

Conclusion – By adopting cloud‑native practices, Ctrip International Ticketing built a stable, automated production environment that accelerates delivery, improves elasticity, and reduces operational costs, while enabling rapid feedback through centralized logging and monitoring.

monitoringcloud nativeKubernetesCost OptimizationloggingPrometheusTerraformInfrastructure as Code
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.