Cloud Native 14 min read

How Ctrip International Flights Scaled with Cloud‑Native Practices and Cut Costs

This article shares Ctrip International Flights' cloud‑native journey, detailing how they adopted infrastructure‑as‑code, automated CI/CD pipelines, centralized logging and monitoring with Prometheus, Grafana and Thanos, and applied elastic scaling, spot instances, and network optimizations to reduce operational costs while maintaining high availability.

dbaplus Community
dbaplus Community
dbaplus Community
How Ctrip International Flights Scaled with Cloud‑Native Practices and Cut Costs

Background

Ctrip International Ticketing aggregates flight data from suppliers worldwide and operates services in regions such as the US, Germany, and Singapore. To avoid the cost and operational burden of building private data centers, the team adopted public‑cloud services and followed cloud‑native best practices to achieve scalability, high availability, and loose coupling.

Cloud Migration Practices

1. Infrastructure as Code (IaC)

All cloud resources are defined in a dedicated Terraform repository. Each change is version‑controlled, reviewed, and applied automatically through CI/CD pipelines. The workflow is:

Developer pushes Terraform code to the IaC repo.

CI pipeline runs terraform fmt, terraform validate, and terraform plan.

After peer review, the pipeline executes terraform apply against the target environment.

Managed Kubernetes services (e.g., Amazon EKS, Azure AKS, GKE) are used so that a production‑grade cluster can be provisioned in minutes, eliminating the need for manual control‑plane management.

2. Centralized Logging

A DaemonSet runs a log‑collector (e.g., Fluent Bit) on every node. The collector reads container stdout/stderr streams, enriches them with metadata, and forwards the logs to a managed Elasticsearch cluster. Applications do not need to embed logging libraries; they simply write to standard output.

# Example DaemonSet manifest (simplified)
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
spec:
  selector:
    matchLabels:
      name: fluent-bit
  template:
    metadata:
      labels:
        name: fluent-bit
    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:1.9
        args: ["-c", "/fluent-bit/etc/fluent-bit.conf"]
        volumeMounts:
        - name: varlog
          mountPath: /var/log
      volumes:
      - name: varlog
        hostPath:
          path: /var/log

Logs are visualized in Kibana, and retention policies are managed at the Elasticsearch level.

3. Monitoring and Alerting

The monitoring stack consists of Prometheus, the Prometheus Operator, Thanos, and Grafana.

Prometheus Operator deploys a Prometheus instance per Kubernetes namespace via a CustomResourceDefinition (CRD). This isolates workloads and avoids a single point of failure.

Thanos Sidecar runs alongside each Prometheus, uploading raw blocks to an S3 bucket every two hours.

Thanos Query aggregates data from all sidecars and the object store, providing a global query endpoint.

Thanos Compact downsamples and compresses older blocks to reduce storage cost.

Grafana connects only to the Thanos Query endpoint, giving a unified view of metrics across all services.

Example Prometheus Operator configuration (simplified):

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus-us-east
  namespace: us-east
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: ticketing
  resources:
    requests:
      memory: 2Gi
      cpu: 500m
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: gp2
        resources:
          requests:
            storage: 50Gi

Grafana dashboards query the Thanos endpoint, eliminating the need to configure multiple data sources.

Thanos architecture
Thanos architecture
cloud-nativeInfrastructure as Codecost-optimization
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.