Scaling Ximalaya’s Gateway to 200B Daily Calls: Architecture & Performance Insights

This article details Ximalaya's evolution of its high‑traffic gateway—from a Tomcat‑based prototype to a fully asynchronous Netty design—covering core features, full‑link timeout handling, monitoring, performance optimizations, and future plans for HTTP/2 and unified downgrade.

Architecture Talk
Architecture Talk
Architecture Talk
Scaling Ximalaya’s Gateway to 200B Daily Calls: Architecture & Performance Insights

1 Background

Gateways are mature middleware used by many internet companies to handle public business traffic, enabling rapid updates without requiring each service to be redeployed. Ximalaya’s gateway processes over 200 billion calls per day, peaks at 4 w+ QPS, and serves more than 500 web services.

2 First version: Tomcat NIO + AsyncServlet

Initial design used Tomcat with an HttpNioClient. The key requirement was non‑blocking request handling to avoid thread blockage when calling downstream services. However, the single‑machine QPS capped at 5 k due to full GC caused by Tomcat’s large request processor pool and memory copy overhead.

Tomcat issues

Too many cached objects leading to GC pressure.

Heap‑to‑off‑heap memory copy when interacting with Netty‑based back‑ends.

Blocking body reads.

HttpNioClient issues

Lock‑based connection acquisition/release causing contention.

3 Second version: Netty + Full Asynchronous

Switched the access and service‑call layers to Netty, achieving full async, lock‑free, layered architecture.

Access layer uses Netty I/O threads for HTTP codec and protocol‑level monitoring. It validates request size, handles black‑white lists, rate limiting, authentication, and API publishing. Abnormal or attack requests are sampled, logged and alerted.

Business‑logic layer implements public features via a filter chain without I/O, including:

User authentication and login validation.

Black‑white list per IP, application and parameter.

Rate limiting with token‑bucket algorithm.

Intelligent circuit breaking (enhanced Hystrix) with automatic downgrade.

Gray release with slow‑start for new machines.

Unified downgrade rules down to request‑header level, integrated with Varnish.

Traffic scheduling and copy for testing.

Request‑log sampling for failures.

All filters are initialized at startup; execution is in‑memory, avoiding I/O. Rule updates trigger real‑time refresh via a dedicated thread.

4 Full‑link timeout mechanism

The gateway defines a chain of timeout checks: protocol parsing, queue wait, connection establishment, waiting for connection, pre‑write timeout, write timeout, and response timeout.

5 Monitoring and alerting

Metrics are reported per second to a management system and stored in InfluxDB. Monitoring covers protocol‑level attacks, oversized requests, latency (including tp99, tp999), QPS, bandwidth, response codes (especially 400/404), connection statistics, failure rates, and traffic jitter.

6 Overall architecture

7 Performance optimization practices

Object pool : Reuse frequently created objects (e.g., thread‑pool tasks, StringBuffer) to reduce allocation and GC pressure.

Context switching : Netty I/O threads are kept lightweight; optional configuration can run business logic on I/O threads, reducing context switches by ~20 %.

GC optimization : Large young generation, SurvivorRatio = 2, max tenuring threshold = 15, and use of off‑heap memory keep most objects out of the old generation. However, sockets’ finalize method can delay reclamation, causing old‑gen growth.

/**
 * Cleans up if the user forgets to close it.
 */
protected void finalize() throws IOException {
    close();
}

Logging : Synchronous console appender and bounded async queue can block Netty I/O threads under high load; careful log design is required.

8 Future plans

Upgrade to HTTP/2 to enable multiplexed connections, further improve monitoring/alerting accuracy, and enhance unified downgrade capabilities.

9 Conclusion

Gateways are now a standard component in internet companies. The article shares practical experiences and solutions, and invites interested engineers to join the ongoing multi‑active project.

JavaPerformanceNettygateway
Architecture Talk
Written by

Architecture Talk

Rooted in the "Dao" of architecture, we provide pragmatic, implementation‑focused architecture content.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.