Backend Development 17 min read

Design and Implementation of Ctrip's High‑Performance API Gateway

This article presents the evolution, architecture, and engineering practices of Ctrip's API gateway, covering its transition to full asynchronous processing with Netty and RxJava, stream‑based forwarding, single‑threaded event‑loop design, performance optimizations, multi‑protocol support, routing, and modular governance for large‑scale microservice environments.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Design and Implementation of Ctrip's High‑Performance API Gateway

Since its first release in 2014, Ctrip's API gateway has become the standard entry point for external traffic, evolving alongside the company's micro‑service architecture to handle over 3,000 services and 200 billion daily requests as of July 2021.

The early implementation was based on Netflix OSS concepts, using a Zuul‑like core built on Tomcat NIO, an asynchronous servlet model, Apache HttpClient for synchronous client calls, and essential components such as Archaius, Hystrix, and Groovy for dynamic configuration, circuit breaking, and hot‑updates.

To address growing latency and thread‑pool pressure caused by overseas traffic and service‑scale expansion, Ctrip embarked on a full‑asynchronous redesign. The new design unifies server‑side, business‑process, and client‑side async handling using Netty (NIO/Epoll + EventLoop) and RxJava for reactive flow control.

Asynchronous Flow Design

All three layers (server, business, client) are built on Netty, while business logic is transformed into asynchronous stages. Common async scenarios include request validation, authentication, IO events, and request forwarding. The design addresses challenges such as process and state management, exception handling, context propagation, thread scheduling, and traffic control.

Key abstractions include Processor and AbstractProcessor interfaces that expose a unified Maybe<T> result, allowing synchronous and asynchronous implementations to be handled uniformly. The processing engine composes inbound, outbound, error, and log stages using RxJava utilities like RxUtil.concat and enforces time‑outs, error fallback, and resource cleanup.

public interface Processor
{
    ProcessorType getType();
    int getOrder();
    boolean shouldProcess(RequestContext context);
    // Unified external wrapper as Maybe
    Maybe
process(RequestContext context) throws Exception;
}
public abstract class AbstractProcessor implements Processor {
    // Synchronous processing without response
    protected void processSync(RequestContext context) throws Exception {}
    // Synchronous processing with response (e.g., health check)
    protected T processSyncAndGetReponse(RequestContext context) throws Exception {
        process(context);
        return null;
    }
    // Asynchronous processing (e.g., auth, remote calls)
    protected Maybe
processAsync(RequestContext context) throws Exception {
        T response = processSyncAndGetReponse(context);
        return response == null ? Maybe.empty() : Maybe.just(response);
    }
    @Override
    public Maybe
process(RequestContext context) throws Exception {
        Maybe
maybe = processAsync(context);
        if (maybe instanceof ScalarCallable) {
            return maybe; // Synchronous path
        } else {
            return maybe.timeout(getAsyncTimeout(context), TimeUnit.MILLISECONDS,
                Schedulers.from(context.getEventloop()), timeoutFallback(context));
        }
    }
    protected long getAsyncTimeout(RequestContext context) { return 2000; }
    protected Maybe
timeoutFallback(RequestContext context) { return Maybe.empty(); }
}

Stream Forwarding & Single‑Threaded Execution

By parsing only the HTTP header and forwarding the body directly, the gateway reduces latency and memory footprint. Streamed processing introduces complexity such as thread‑safety, multi‑stage coordination, and edge‑case handling, which Ctrip mitigates by binding the entire request lifecycle to a single Netty event‑loop, eliminating concurrent state mutations.

Additional optimizations include lazy loading of request fields, off‑heap memory with zero‑copy, and adoption of the ZGC collector in JDK 11 to lower GC pause times.

Gateway Business Forms

The gateway serves as a decoupling layer for diverse network environments (intranet, internet, IDC zones) and provides universal cross‑cutting concerns such as security, authentication, routing, rate‑limiting, monitoring, and alerting. It also supports private protocols (SOTP), link optimization, and multi‑region active‑active deployments.

Governance & Multi‑Protocol Compatibility

To simplify operations across multiple clusters and protocols, Ctrip introduced a control plane that standardizes protocol adapters, defines a common intermediate model, and manages routing, module orchestration, and gray‑release configurations. Routing rules are expressed in JSON, supporting URI prefix matching, tag‑based access control, and weighted target selection.

{
    "type": "uri",
    "value": "/hotel/order",
    "matcherType": "prefix",
    "tags": ["owner_admin", "org_framework", "appId_123456"],
    "properties": {"core": "true"},
    "routes": [{
        "condition": "true",
        "zone": "PRO",
        "targets": [{"url": "http://test.ctrip.com/hotel", "weight": 100}]
    }]
}

Module orchestration allows the control plane to inject custom processing stages (e.g., adding response headers) with configurable order, gray‑ratio, and exception handling.

{
    "name": "addResponseHeader",
    "stage": "PRE_RESPONSE",
    "ruleOrder": 0,
    "grayRatio": 100,
    "condition": "true",
    "actionParam": {
        "connection": "keep-alive",
        "x-service-call": "${request.func.remoteCost}",
        "Access-Control-Expose-Headers": "x-service-call",
        "x-gate-root-id": "${func.catRootMessageId}"
    },
    "exceptionHandle": "return"
}

In summary, Ctrip's self‑developed gateway combines full asynchronous processing, stream‑based forwarding, single‑event‑loop execution, and a flexible control plane to achieve high performance, scalability, and maintainability for billions of daily requests across multiple protocols and regions.

Javaperformance optimizationmicroservicesAPI GatewayNettyRxJavaAsync Architecture
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.