How Ctrip Handles 20 Billion Daily Requests with a High‑Performance Java API Gateway
This article explains how Ctrip’s API gateway, built on Java, Netty and RxJava, evolved from a Zuul‑based design to a fully asynchronous, single‑threaded architecture that processes 200 billion daily requests, detailing core components, streaming forwarding, performance optimizations, governance, multi‑protocol support, routing and module orchestration.
Overview
Ctrip’s API gateway was introduced together with the company’s micro‑service architecture in 2014. By July 2021 it had integrated more than 3,000 services and handled about 200 billion daily requests. Early versions were based on Netflix‑OSS concepts, especially Zuul 1.0, using Tomcat NIO, an independent thread‑pool, Apache HttpClient, Archaius, Hystrix and Groovy for hot‑updates.
High‑Performance Core Design
2.1 Asynchronous Flow Design
The gateway now runs fully asynchronously: server‑side async, business‑logic async, and client‑side async, all built on Netty’s NIO/Epoll event‑loop model. Business processes are wrapped in RxJava Maybe streams, providing unified handling of synchronous and asynchronous paths, automatic timeout, error fallback and thread‑scheduling.
public interface Processor<T> {
ProcessorType getType();
int getOrder();
boolean shouldProcess(RequestContext context);
// Unified external API
Maybe<T> process(RequestContext context) throws Exception;
} public abstract class AbstractProcessor<T> implements Processor<T> {
protected void processSync(RequestContext context) throws Exception {}
protected T processSyncAndGetReponse(RequestContext context) throws Exception { process(context); return null; }
protected Maybe<T> processAsync(RequestContext context) throws Exception {
T response = processSyncAndGetReponse(context);
return response == null ? Maybe.empty() : Maybe.just(response);
}
@Override
public Maybe<T> process(RequestContext context) throws Exception {
Maybe<T> maybe = processAsync(context);
if (maybe instanceof ScalarCallable) {
return maybe; // synchronous shortcut
} else {
return maybe.timeout(getAsyncTimeout(context), TimeUnit.MILLISECONDS,
Schedulers.from(context.getEventloop()), timeoutFallback(context));
}
}
protected long getAsyncTimeout(RequestContext context) { return 2000; }
protected Maybe<T> timeoutFallback(RequestContext context) { return Maybe.empty(); }
}2.2 Streaming Forwarding & Single‑Threaded Execution
Only the HTTP request line and headers are parsed; the body is streamed directly to the upstream service. This reduces latency and memory usage but introduces challenges such as thread‑safety, multi‑stage coordination and edge‑case handling. To mitigate these, the entire request‑processing pipeline (Netty server, business logic, Netty client) runs on the same event‑loop thread, while any blocking I/O is off‑loaded to dedicated thread pools and later switched back.
2.3 Other Optimizations
Lazy loading of internal variables (e.g., cookies, query strings).
Off‑heap memory and zero‑copy techniques.
Adoption of JDK 11 with ZGC, yielding lower GC pause times.
Custom HTTP codec to handle non‑standard traffic and improve security.
Traffic governance to process oversized or malformed requests instead of rejecting them outright.
Gateway Business Shape
The gateway acts as a unified ingress point, decoupling internal and external networks, providing common cross‑cutting concerns (security, authentication, rate‑limiting, monitoring) and enabling efficient traffic control for private protocols, link optimization and multi‑region active‑active deployments.
Gateway Governance
Governance is achieved through a control plane that manages protocol compatibility, routing, and module orchestration. Multi‑protocol support abstracts HTTP/1.x, HTTP/2, and proprietary SOTP protocols behind a unified model.
4.1 Multi‑Protocol Compatibility
A protocol‑adaptation layer hides differences in encoding and connection handling, while a common intermediate model lets business code operate independently of the underlying protocol.
4.2 Routing Module
{
"type": "uri",
"value": "/hotel/order",
"matcherType": "prefix",
"tags": ["owner_admin","org_framework","appId_123456"],
"properties": {"core": "true"},
"routes": [{
"condition": "true",
"zone": "PRO",
"targets": [{"url": "http://test.ctrip.com/hotel", "weight": 100}]
}]
}4.3 Module Orchestration
Modules are scheduled per processing stage (e.g., PRE_RESPONSE) with configurable order, gray‑ratio, conditions and action parameters. This decouples feature implementation from the gateway core.
{
"name": "addResponseHeader",
"stage": "PRE_RESPONSE",
"ruleOrder": 0,
"grayRatio": 100,
"condition": "true",
"actionParam": {
"connection": "keep-alive",
"x-service-call": "${request.func.remoteCost}",
"Access-Control-Expose-Headers": "x-service-call",
"x-gate-root-id": "${func.catRootMessageId}"
},
"exceptionHandle": "return"
}Conclusion
Ctrip chose to develop its own gateway rather than adopt existing solutions such as Zuul, Nginx, Spring Cloud Gateway or Istio, because it better fits the company’s specific business and technical ecosystem. Ongoing work includes exploring public vs. private gateway separation, HTTP/3, and tighter integration with Service Mesh.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
