Design and Implementation of Ctrip's High‑Performance API Gateway
This article presents the evolution, architecture, and engineering practices of Ctrip's API gateway, covering its transition to full asynchronous processing with Netty and RxJava, stream‑based forwarding, single‑threaded event‑loop design, performance optimizations, multi‑protocol support, routing, and modular governance for large‑scale microservice environments.
Since its first release in 2014, Ctrip's API gateway has become the standard entry point for external traffic, evolving alongside the company's micro‑service architecture to handle over 3,000 services and 200 billion daily requests as of July 2021.
The early implementation was based on Netflix OSS concepts, using a Zuul‑like core built on Tomcat NIO, an asynchronous servlet model, Apache HttpClient for synchronous client calls, and essential components such as Archaius, Hystrix, and Groovy for dynamic configuration, circuit breaking, and hot‑updates.
To address growing latency and thread‑pool pressure caused by overseas traffic and service‑scale expansion, Ctrip embarked on a full‑asynchronous redesign. The new design unifies server‑side, business‑process, and client‑side async handling using Netty (NIO/Epoll + EventLoop) and RxJava for reactive flow control.
Asynchronous Flow Design
All three layers (server, business, client) are built on Netty, while business logic is transformed into asynchronous stages. Common async scenarios include request validation, authentication, IO events, and request forwarding. The design addresses challenges such as process and state management, exception handling, context propagation, thread scheduling, and traffic control.
Key abstractions include Processor and AbstractProcessor interfaces that expose a unified Maybe<T> result, allowing synchronous and asynchronous implementations to be handled uniformly. The processing engine composes inbound, outbound, error, and log stages using RxJava utilities like RxUtil.concat and enforces time‑outs, error fallback, and resource cleanup.
public interface Processor
{
ProcessorType getType();
int getOrder();
boolean shouldProcess(RequestContext context);
// Unified external wrapper as Maybe
Maybe
process(RequestContext context) throws Exception;
} public abstract class AbstractProcessor implements Processor {
// Synchronous processing without response
protected void processSync(RequestContext context) throws Exception {}
// Synchronous processing with response (e.g., health check)
protected T processSyncAndGetReponse(RequestContext context) throws Exception {
process(context);
return null;
}
// Asynchronous processing (e.g., auth, remote calls)
protected Maybe
processAsync(RequestContext context) throws Exception {
T response = processSyncAndGetReponse(context);
return response == null ? Maybe.empty() : Maybe.just(response);
}
@Override
public Maybe
process(RequestContext context) throws Exception {
Maybe
maybe = processAsync(context);
if (maybe instanceof ScalarCallable) {
return maybe; // Synchronous path
} else {
return maybe.timeout(getAsyncTimeout(context), TimeUnit.MILLISECONDS,
Schedulers.from(context.getEventloop()), timeoutFallback(context));
}
}
protected long getAsyncTimeout(RequestContext context) { return 2000; }
protected Maybe
timeoutFallback(RequestContext context) { return Maybe.empty(); }
}Stream Forwarding & Single‑Threaded Execution
By parsing only the HTTP header and forwarding the body directly, the gateway reduces latency and memory footprint. Streamed processing introduces complexity such as thread‑safety, multi‑stage coordination, and edge‑case handling, which Ctrip mitigates by binding the entire request lifecycle to a single Netty event‑loop, eliminating concurrent state mutations.
Additional optimizations include lazy loading of request fields, off‑heap memory with zero‑copy, and adoption of the ZGC collector in JDK 11 to lower GC pause times.
Gateway Business Forms
The gateway serves as a decoupling layer for diverse network environments (intranet, internet, IDC zones) and provides universal cross‑cutting concerns such as security, authentication, routing, rate‑limiting, monitoring, and alerting. It also supports private protocols (SOTP), link optimization, and multi‑region active‑active deployments.
Governance & Multi‑Protocol Compatibility
To simplify operations across multiple clusters and protocols, Ctrip introduced a control plane that standardizes protocol adapters, defines a common intermediate model, and manages routing, module orchestration, and gray‑release configurations. Routing rules are expressed in JSON, supporting URI prefix matching, tag‑based access control, and weighted target selection.
{
"type": "uri",
"value": "/hotel/order",
"matcherType": "prefix",
"tags": ["owner_admin", "org_framework", "appId_123456"],
"properties": {"core": "true"},
"routes": [{
"condition": "true",
"zone": "PRO",
"targets": [{"url": "http://test.ctrip.com/hotel", "weight": 100}]
}]
}Module orchestration allows the control plane to inject custom processing stages (e.g., adding response headers) with configurable order, gray‑ratio, and exception handling.
{
"name": "addResponseHeader",
"stage": "PRE_RESPONSE",
"ruleOrder": 0,
"grayRatio": 100,
"condition": "true",
"actionParam": {
"connection": "keep-alive",
"x-service-call": "${request.func.remoteCost}",
"Access-Control-Expose-Headers": "x-service-call",
"x-gate-root-id": "${func.catRootMessageId}"
},
"exceptionHandle": "return"
}In summary, Ctrip's self‑developed gateway combines full asynchronous processing, stream‑based forwarding, single‑event‑loop execution, and a flexible control plane to achieve high performance, scalability, and maintainability for billions of daily requests across multiple protocols and regions.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.