Design and Performance Optimization of a Custom API Gateway (DAG)
This article presents a comprehensive analysis of the limitations of Spring Cloud Gateway, proposes a self‑developed API gateway architecture called DAG, details its core components such as request context, filter chain, async client, routing, and connection management, and demonstrates through extensive performance testing that DAG achieves significantly higher throughput, lower latency, and improved stability compared to the original SCG solution.
The legacy Spring Cloud Gateway (SCG) suffered from security risks, poor routing performance, memory leaks, complex reactive programming, and high latency during cold starts, especially as the number of registered routes grew to tens of thousands.
To address these issues, the team designed a new API gateway (DAG) with the following principles: support massive dynamic route registration, simplify the programming model, achieve at least SCG‑level performance, ensure long‑term stability without memory leaks, provide extensible timeout and protocol handling, and separate data flow from control flow.
Core Design
• Request Context : a DefaultServerWebExchange object encapsulates client‑to‑proxy channel, proxy‑to‑client channel, event loop, request, response, and attributes.
• Filter Chain : request and response filters are ordered without duplicate order values, enabling asynchronous processing and proper suspension of request handling until downstream RPC responses arrive.
• AsyncClient : a generic, protocol‑agnostic client built on Netty that manages connection pools, supports non‑blocking RPC calls, timeout scheduling, and callback handling. Sample code: @Override protected void channelRead0(ChannelHandlerContext ctx, Object msg) throws Exception { AsyncContext asyncContext = ctx.attr(AsyncClient.ASYNC_REQUEST_KEY).get(); asyncContext.state = AsyncContext.STATE.RECEIVED; asyncContext.releaseChannel(); asyncContext.responsePromise.setSuccess(msg); }
• Routing : routes are stored in memory and matched using O(1) hash lookup based on the request path, avoiding the O(N) iteration of SCG. The Route class holds the route ID, skip count, and target URI.
• Thread‑Per‑Core Model : each request is processed entirely within a single Netty worker event‑loop thread, eliminating cross‑thread data races and reducing context‑switch overhead.
Performance Evaluation
Load tests (wrk with 32 threads, 1000 connections) showed DAG sustaining ~45 k QPS at 80 % CPU with an ART of 19 ms, while SCG capped at ~11 k QPS at 95 % CPU with an ART of 54 ms. Memory leak issues were eliminated, and cold‑start latency spikes were reduced by 99 % after pre‑initializing shared contexts.
Additional optimizations included consolidating load‑balancer instances, reducing Sentinel thread count from hundreds to four, and adopting JDK 17 with ZGC, which lowered GC pause times from ~70 ms to ~1 ms and improved overall throughput.
Conclusion
The DAG gateway demonstrates that a carefully engineered, Java‑based, Netty‑driven architecture can outperform the widely used SCG by a factor of four in throughput while delivering lower latency, higher stability, and better resource utilization, validating the feasibility of self‑developed high‑performance API gateways for large‑scale microservice environments.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.