How Ximalaya Scaled Its Gateway to 200 Billion Requests: Backend Architecture Lessons

This article details Ximalaya's evolution from a Tomcat‑based gateway to a Netty‑driven, fully asynchronous architecture, covering design challenges, performance bottlenecks, traffic management features, monitoring, and future HTTP/2 plans that enabled handling over 200 billion daily calls with high QPS.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
How Ximalaya Scaled Its Gateway to 200 Billion Requests: Backend Architecture Lessons

Background Gateways are essential middleware in large internet companies, enabling rapid rollout of public features without updating every service. Ximalaya serves over 600 million users, runs 500+ web services, processes more than 200 billion calls daily, and peaks at 40 k QPS per instance.

The gateway must provide reverse‑proxy, black/white lists, flow control, authentication, circuit breaking, API publishing, monitoring, and alerting, as well as advanced capabilities such as traffic scheduling, copy, pre‑release, intelligent upgrade/downgrade, and warm‑up.

First Version: Tomcat NIO + AsyncServlet

The initial design used Tomcat with asynchronous servlets. To avoid blocking the request thread while calling backend services, the gateway had to be fully async. However, performance quickly hit a ceiling: at ~5 k QPS the JVM suffered frequent full GCs due to Tomcat's large object pools, request processor caching, and memory leaks in Servlet 3.0 async handling.

Tomcat Issues

Excessive caching leads to high GC pressure.

Heap‑based buffer copies cause extra memory copies when interacting with Netty‑based backends.

Reading the request body is blocking, unlike Netty's non‑blocking model.

These problems motivated a rewrite of the ingress layer.

Second Version: Netty + Full Asynchrony

Switching to Netty eliminated the above bottlenecks. The new architecture is fully asynchronous, lock‑free, and layered:

Ingress Layer : Netty I/O threads handle HTTP encoding/decoding, enforce request size limits, and provide attack detection. Oversized requests receive an immediate 400 response.

Business Logic Layer : Implements public features (auth, blacklist, flow control, circuit breaking, gray release, traffic scheduling, copy, log sampling) using a responsibility‑chain pattern without I/O.

Service Call Layer : Uses Netty's connection pool for lock‑free HTTP calls, supports full‑link timeout handling (protocol parsing, queue wait, connection establishment, write timeout, response timeout, etc.).

Asynchronous Push : After sending a request, the worker thread continues processing; a context bound to the connection is used to write the response when it arrives.

Connection Pool : Manages keep‑alive connections, handling Connection:close, idle timeout, read/write timeout, FIN/RESET, and ensures proper release only after the backend response.

Full‑Link Timeout Mechanism

Protocol parsing timeout

Queue wait timeout

Connection establishment timeout

Write timeout (including large POST bodies)

Response timeout

Monitoring & Alerting

Second‑level alerts and metrics aggregated into InfluxDB.

Comprehensive HTTP monitoring at protocol, service, and application layers (request size, QPS, bandwidth, response codes, connection health, failure rates, traffic jitter).

Performance Optimizations

Object‑pooling to reduce allocation and GC pressure.

Minimized context switches; synchronous configuration reduced CPU switches by ~20%.

GC tuning: large young generation, SurvivorRatio=2, max tenuring threshold 15, and off‑heap buffers.

Log handling: avoided synchronous console appender flushes and bounded async appender buffers to prevent Netty I/O thread blockage.

Future Plans

Upgrade to HTTP/2 to multiplex multiple requests per connection, eliminating per‑request connection overhead.

Continue refining monitoring, alerting, and unified downgrade strategies for full‑site resilience.

Conclusion The gateway has become a core, standardized component across internet companies. The shared experience—including migration from Tomcat to Netty, full‑link timeout handling, traffic governance, and performance tuning—offers practical guidance for building high‑throughput, low‑latency backend gateways.

Gateway evolution diagram
Gateway evolution diagram
Tomcat NIO architecture
Tomcat NIO architecture
Netty ingress architecture
Netty ingress architecture
Connection pool diagram
Connection pool diagram
Full‑link timeout flow
Full‑link timeout flow
Thread model
Thread model
<span>/**
 * Cleans up if the user forgets to close it.
 */
protected void finalize() throws IOException {
    close();
}
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendAsynchronousHTTP2gateway
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.