Backend Development 16 min read

Evolution and Performance Optimization of Ximalaya's High‑Throughput HTTP Gateway

This article details the design evolution, architectural redesign, and performance‑tuning techniques of Ximalaya's gateway—from an initial Tomcat NIO implementation to a fully asynchronous Netty‑based solution—covering traffic management, timeout handling, monitoring, and future HTTP/2 migration.

High Availability Architecture

Feb 22, 2021

Evolution and Performance Optimization of Ximalaya's High‑Throughput HTTP Gateway

Background – Gateways are essential middleware in large‑scale internet companies for handling public‑service traffic, enabling rapid feature rollout without requiring every downstream service to update. Ximalaya processes over 200 billion calls daily, with peak QPS exceeding 40 k, and supports features such as black‑white lists, flow control, authentication, circuit breaking, API publishing, monitoring, and alerting.

First Version: Tomcat NIO + AsyncServlet – The initial gateway used Tomcat with NIO and AsyncServlet to avoid blocking on backend calls. However, it suffered from heavy memory usage, frequent full GC, and three‑copy data movement due to Tomcat’s request buffering and blocking body reads.

Problems with Tomcat

Excessive object caching leads to GC pressure.

Heap‑based memory copy incurs multiple data copies.

Body reading is blocking, unlike Netty’s non‑blocking model.

Second Version: Netty + Full Asynchrony – Switching to Netty eliminated the above issues, providing a lock‑free, layered architecture with separate access, business‑logic, and service‑call layers.

Access Layer – Netty I/O threads handle HTTP codec, enforce request size limits, and quickly reject oversized or malformed requests with a 400 response.

Business Logic Layer – Implements public features (authentication, black‑white list, flow control, intelligent circuit breaking, gray release, unified downgrade, traffic scheduling, traffic copy, log sampling) using a filter chain without I/O in the chain itself.

Service Call Layer – Uses Netty’s connection pool for lock‑free acquisition/release, ensuring asynchronous remote calls.

Asynchronous Push – Each request creates a context bound to a connection; when the backend response arrives, the context is used to push the response back to the client, keeping the original request thread free.

Connection Pool – Illustrated below, it manages HTTP connections, ensuring they are closed only after the response is fully received, handling scenarios like Connection:close, idle timeout, read/write timeout, and FIN/RESET.

Full‑Link Timeout Mechanism – Covers protocol parsing timeout, queue waiting, connection establishment, write timeout, and response timeout, as shown in the diagram.

Monitoring & Alerting – Provides second‑level metrics and alerts for protocol‑level (attack detection, oversized requests) and application‑level (latency, QPS, bandwidth, response codes, connection stats, failure rates, traffic jitter).

Performance Optimizations

Object pool for reusable objects (thread‑pool tasks, StringBuffer, etc.).

Reduced context switches by optionally running business logic on Netty I/O threads, cutting CPU switches by ~20%.

GC tuning: large young generation, SurvivorRatio=2, max tenuring age=15, and careful handling of socket finalizers to avoid old‑gen growth.

Logging adjustments: avoid synchronous console appender flushes and prevent Log4j AsyncAppender buffer blocking.

/**
 * Cleans up if the user forgets to close it.
 */
protected void finalize() throws IOException {
    close();
}

Future Plans – Migrate to HTTP/2 for multiplexed connections, continue improving monitoring/alerting, and enhance universal downgrade mechanisms to ensure graceful degradation across the entire platform.

Conclusion – Gateways are now a standard component in internet companies; the article shares practical insights, challenges, and ongoing efforts such as multi‑active, cloud‑native, and stability platforms, while also inviting talent to join the Ximalaya engineering team.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Asynchronous traffic management HTTP

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.