Cloud Native 14 min read

How Ctrip Scaled AI Model Access with Higress: Architecture, Challenges, and Solutions

Ctrip’s R&D team built an AI gateway using Higress to unify access to diverse large‑model services, addressing authentication, traffic control, fault tolerance, monitoring, and integration with internal MCP platforms, while sharing practical lessons and future plans.

Alibaba Cloud Native

Aug 31, 2025

How Ctrip Scaled AI Model Access with Higress: Architecture, Challenges, and Solutions

Challenges in Large‑Scale AI Service Integration

Multiple external and internal large‑model services require different network routes and authentication mechanisms.

Cost accounting and usage statistics are siloed across business units, making global quota management impossible.

During traffic spikes or model‑service failures there is no unified rate‑limiting, circuit‑breaking, or traffic‑shaping, so each team implements its own ad‑hoc solution.

Why Higress Was Chosen as the AI Gateway

Built on Alibaba’s mature API‑gateway stack and extended with AI‑specific features.

Uses Istio + Envoy as the data‑plane; supports Wasm plugins written in C++, Go or Rust for custom logic.

Active community releases a new version every 2‑3 weeks, providing rapid feature delivery.

Architecture Overview

The gateway runs entirely inside Ctrip’s Kubernetes clusters and consists of three core components:

Gateway (data plane) – receives external traffic and forwards it to backend model services.

Controller (control plane) – watches custom Kubernetes resources and pushes configuration to the Gateway.

Management API – integrates with the internal Machine‑Learning Platform (MCP); administrators register model services, define routing rules, and persist configuration as Kubernetes resources.

Configuration uses native K8s objects plus custom resources. Each consumer is assigned a distinct path that maps to one or more model routes . A model route can load‑balance across multiple backend instances. Model‑name aliasing lets callers use a unified name while the gateway rewrites it to the concrete name required by each backend.

Traffic Governance

Rate limiting : Consumers specify limits in TPM (tokens per minute), QPM (queries per minute) or concurrent requests. A Higress Wasm plugin stores counters in Redis and updates them atomically via Lua scripts.

Fallback / degradation : If a primary model returns 4xx/5xx, the gateway automatically retries the request against a pre‑configured fallback model, applying independent name‑mapping if needed.

Logging & monitoring : Logs are written locally, rotated by logrotate, and enriched with model name, token usage, request/response payloads via custom Wasm log templates. Logs are shipped with Filebeat to Kafka, processed, stored in ClickHouse, and visualized in Kibana. Metrics are exposed on a Prometheus endpoint and displayed in Grafana.

Integration with MCP Services

The gateway also exposes traditional HTTP APIs as MCP services. OpenAPI/Swagger contracts are converted into tool‑description formats consumable by LLMs. Ctrip uses LLMs to auto‑generate these descriptions, followed by minimal human review.

For SSE‑based MCP calls (still required by legacy clients), the gateway creates a session ID, subscribes to a Redis channel, and forwards responses back to the client via the SSE stream.

Key Implementation Details

Model routing : Consumer‑specific paths map to multiple model routes; each route can balance across several backend services. Model‑name aliasing enables a unified request format.

Authentication : Clients present a Bearer token that maps to a consumer record defining allowed services. Backend credentials are stored in the gateway and injected into outbound requests; for MCP services the gateway can either inject stored credentials or forward client‑provided ones based on configuration.

Protocol adaptation : The gateway normalizes divergent vendor APIs (different paths, auth schemes) to a common OpenAI‑compatible interface using Wasm plugins for request/response transformation.

SSE session handling : Upon an SSE request, the gateway generates a SessionID, listens on a Redis channel keyed by that ID, returns the endpoint to the client, and publishes downstream responses to the channel for delivery over the SSE connection.

Outcomes and Future Work

The AI gateway now provides stable, scalable access to multiple large‑model providers and MCP services across Ctrip. Planned enhancements include:

More expressive model‑routing rules (e.g., weight‑based, content‑aware).

Post‑processing of model outputs.

Consumer priority detection and quota enforcement.

Content‑safety safeguards.

Deeper security and compliance integration.

All core features are open‑source in the Higress project; Ctrip contributes back via pull requests to address gaps discovered during internal adoption.

observability Higress MCP integration

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.