How Ximalaya Scaled Its Gateway to 200B Daily Calls: Architecture & Optimizations

This article details Ximalaya's evolution of its HTTP gateway—from a Tomcat NIO prototype to a fully asynchronous Netty design—covering architectural diagrams, performance bottlenecks, traffic management features, monitoring, GC tuning, and future plans for HTTP/2 and graceful degradation.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How Ximalaya Scaled Its Gateway to 200B Daily Calls: Architecture & Optimizations

Background

Gateways are mature middleware used by major internet companies to handle public business features and enable rapid updates. Ximalaya’s gateway processes 200 billion daily calls, with peak QPS over 40 k, supporting 500+ web services for over 600 million users.

First Version: Tomcat NIO + AsyncServlet

Initial design used Tomcat with NIO and AsyncServlet. The architecture required a separate Push layer and HttpNioClient for backend communication. However, performance bottlenecks appeared at ~5k QPS due to Tomcat’s request caching, memory copy, and blocking body reads.

Tomcat issues

Excessive object pooling causing GC pressure.

Heap memory copy between Netty and Tomcat.

Blocking body read in NIO model.

HttpNioClient issues

Lock contention on connection acquire/release.

Second Version: Netty + Full Asynchronous

Switching to Netty eliminated the above problems, delivering a fully asynchronous, lock‑free, layered architecture.

Access Layer

Netty I/O threads handle HTTP codec and protocol‑level monitoring. Optimized request/response size limits and attack detection provide immediate 400 responses for oversized requests.

Business Logic Layer

Implements public features such as authentication, black/white lists, flow control, intelligent circuit breaking, gray release, unified degradation, traffic scheduling, traffic copy, and request sampling using a filter‑based design.

Service Call Layer

Asynchronous service calls use Netty’s connection pool with lock‑free operations. A timeout mechanism starts after successful flush to avoid unfair timing.

Full‑Link Timeout Mechanism

Handles protocol parsing, queue waiting, connection establishment, link waiting, pre‑write timeout, write timeout, and response timeout.

Monitoring & Alarm

Provides second‑level monitoring and alerts for protocol‑level attacks, oversized requests, latency, QPS, bandwidth, response codes, link health, failure rates, and traffic jitter, storing data in InfluxDB.

Performance Optimizations

Object Pooling : Reuse frequently used objects (e.g., thread‑pool tasks, StringBuffer) to reduce allocation and GC pressure.

Context Switching : Asynchronous design reduces thread switches; tuning showed a 20 % reduction in CPU context switches when pushing through Netty I/O threads.

GC Tuning : Large young generation, SurvivorRatio = 2, max tenuring age = 15, and careful handling of socket finalizers to prevent old‑generation growth.

/**
 * Cleans up if the user forgets to close it.
 */
protected void finalize() throws IOException {
    close();
}

Logging : Avoid synchronous console appender flushing; use asynchronous logging to prevent I/O thread blockage.

Future Plans

Upgrade to HTTP/2 for multiplexed connections, continue improving monitoring accuracy, and enhance graceful degradation mechanisms to ensure site‑wide reliability.

Conclusion

The gateway has become a standard component in large‑scale internet services. The article shares practical insights and ongoing improvements, inviting interested engineers to join the project.

Source: https://www.jianshu.com/p/165b1941cdfa
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringperformance optimizationHTTP2gatewayasynchronous processing
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.