Cloud Native 13 min read

Ctrip International Ticketing Cloud‑Native Migration: Infrastructure as Code, Logging, Monitoring, and Cost Optimization

This article shares Ctrip International Ticketing’s cloud‑native migration experience, covering infrastructure‑as‑code with Terraform, managed Kubernetes, centralized logging and monitoring using Elastic Search, Prometheus, Grafana and Thanos, and practical cost‑optimization techniques such as auto‑scaling, spot instances, storage tiering and network proxying.

Ctrip Technology

Mar 4, 2021

Ctrip International Ticketing Cloud‑Native Migration: Infrastructure as Code, Logging, Monitoring, and Cost Optimization

Background – To support overseas users, Ctrip International Ticketing sources data from global suppliers and runs services in many regions. Public cloud was chosen over building private data centers for flexibility and cost.

Cloud‑Native Adoption – The team follows established cloud‑native standards to build scalable, highly available, loosely coupled applications, focusing on rapid, low‑cost service delivery.

2.1 Infrastructure as Code – All infrastructure is defined in version‑controlled IaC repositories alongside application code. Terraform is used for declarative provisioning of managed Kubernetes clusters and other resources, enabling reproducible environments and CI/CD integration.

2.2 Logging – A managed Elastic Search service is used for log storage. Logs are collected via a DaemonSet on each node, forwarded to stdout/stderr by applications, and then processed and visualized in Kibana, decoupling logging from business code.

2.3 Monitoring – The monitoring stack consists of Prometheus + Grafana, deployed with the Prometheus Operator and integrated with Thanos for high‑availability, long‑term storage, and multi‑cluster aggregation. Thanos Sidecar uploads data to S3, while Thanos Compact downsamples old data to reduce storage costs.

3.1 Compute Cost Optimization – Elastic scaling is achieved with Kubernetes HPA and Cluster Autoscaler, automatically adjusting pod replicas and node counts based on load. Spot (bid) instances are mixed with on‑demand instances, using node affinity to run tolerant workloads on cheaper spot capacity while keeping critical services on stable nodes.

3.2 Storage Cost Optimization – Historical log and monitoring data are periodically snapshotted and moved to low‑cost object storage. Serverless functions (e.g., AWS Lambda) run lightweight backup scripts, charging only for execution time.

3.3 Network Cost Optimization – For outbound‑heavy ticket queries, a transparent Squid proxy is deployed in a private subnet to route external traffic, allowing the use of outbound‑only pricing models.

Conclusion – By adopting cloud‑native practices, Ctrip International Ticketing built a stable, automated production environment that accelerates delivery, improves elasticity, and reduces operational costs, while enabling rapid feedback through centralized logging and monitoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Logging Prometheus terraform infrastructure-as-code cost-optimization

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.