How Ximalaya Scaled Its Gateway to 200 B Calls: Async Netty Architecture Lessons

This article details Ximalaya's evolution from a Tomcat‑based gateway to a fully asynchronous Netty implementation, covering architectural redesign, performance bottlenecks, traffic management features, connection‑pool handling, timeout mechanisms, monitoring, and future plans for HTTP/2 and cloud‑native stability.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
How Ximalaya Scaled Its Gateway to 200 B Calls: Async Netty Architecture Lessons

Background Ximalaya, like many large internet companies, uses a gateway middleware to decouple public‑service feature rollout from individual business services, enabling rapid updates and high‑throughput handling of over 200 billion daily calls with peak QPS exceeding 40 k.

The gateway provides core reverse‑proxy capabilities plus public features such as black‑/white‑list, flow control, authentication, circuit breaking, API publishing, monitoring, and alerting. Additional functions include traffic scheduling, copy, pre‑release, intelligent upgrade/downgrade, and traffic warm‑up.

First Version: Tomcat NIO + AsyncServlet

The initial design used Tomcat with NIO and AsyncServlet to avoid blocking the gateway thread while waiting for backend responses. However, the single‑machine QPS quickly hit a bottleneck (~5 k) due to excessive GC caused by Tomcat's object pools, three‑copy buffer handling, and blocking body reads.

Tomcat Issues

Excessive caching leads to frequent full GC.

Heap‑based buffers cause multiple memory copies when communicating with Netty‑based backends.

Body reads are blocking; Tomcat's NIO model differs from Netty's reactor model.

Second Version: Netty + Full Asynchrony

Replacing Tomcat with Netty eliminated the above problems. Netty provides a lock‑free, fully asynchronous, layered architecture.

Ingress Layer Netty I/O threads handle HTTP codec, monitor protocol‑level exceptions, and enforce size limits on request lines, headers, and cookies, returning 400 for oversized requests.

Business Logic Layer Implements public features via a responsibility‑chain pattern without I/O, supporting user authentication, black/white‑list, flow control (token‑bucket), intelligent circuit breaking, gray release with slow‑start, unified downgrade, traffic scheduling, traffic copy, and log sampling.

User authentication and login verification (API‑level config)

Black/white‑list (global, application, IP, parameter level)

Flow control (automatic token‑bucket and manual)

Intelligent circuit breaking with auto‑upgrade/downgrade

Gray release with pre‑warm windows

Unified downgrade with fine‑grained rule matching

Traffic scheduling and copy for testing and validation

Log sampling for all failed requests

All filters are in‑memory and initialized at startup, avoiding I/O during request processing.

Service Call Layer Uses Netty's connection pool with lock‑free acquisition/release. Connections are closed on Connection:close, idle timeout, read timeout, write timeout, or FIN/RESET. Netty’s write‑timeout logic ensures large POST bodies do not block backend Tomcat services.

Full‑Link Timeout Mechanism Covers protocol parsing, queue waiting, connection establishment, write checks, write timeout, and response timeout.

Monitoring & Alerting Provides second‑level metrics for protocol, service, QPS, latency (tp99/tp999), bandwidth, response codes, connection usage, failure rates, and traffic jitter, all stored in InfluxDB.

Performance Optimizations

Object‑pooling to reduce allocation and GC pressure.

Context‑switch reduction: optional synchronous execution of business logic cuts CPU context switches by ~20%.

GC tuning: large young generation, SurvivorRatio=2, max tenuring age 15, and off‑heap buffers.

Finalize‑based socket cleanup can delay GC; explicit connection close is preferred.

Logging bottlenecks mitigated by disabling immediate flush and avoiding blocking AsyncAppender buffers.

/**
 * Cleans up if the user forgets to close it.
 */
protected void finalize() throws IOException {
    close();
}

Future Plans Migrate to HTTP/2 to enable multiplexed streams over a single connection, continue improving monitoring accuracy, and enhance unified downgrade mechanisms for full‑site fault tolerance.

Conclusion The gateway has become a standard component for large‑scale internet services; the article shares practical insights on architecture, performance tuning, and operational stability, inviting further discussion and ongoing enhancements such as multi‑active, cloud‑native, and stability platform projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ScalabilityAsynchronousgateway
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.