Designing High‑Availability, High‑Performance Backend Architecture for Amap’s Real‑Time Services
This article explains how Amap (Gaode) handles billions of daily requests with sub‑millisecond latency by redesigning its gateway layer, adopting full‑asynchronous pipeline architecture, leveraging reactive frameworks like Vert.x and WebFlux, aggregating APIs, and implementing a unit‑based routing solution that paves the way for distributed sidecar and service‑mesh deployments.
Amap’s daily active users exceed one hundred million, generating hundreds of billions of requests that require ultra‑low latency for scenarios such as real‑time traffic, navigation, and driver‑passenger location sharing.
The presentation is divided into three parts: (1) challenges and thinking behind the access‑layer gateway, (2) high‑availability and high‑performance architecture design, and (3) future server‑side considerations and planning.
1. Access‑layer considerations – The gateway sits between client applications and various engines (driving, walking, etc.), serving over 80 applications, 500+ APIs, and peaking at 600k QPS. The main goal is to provide stable, efficient, and empowering services while minimizing resource usage.
To meet the sub‑5 ms response requirement for location services, Amap performed a major system upgrade that introduced a fully asynchronous, streaming pipeline architecture, halving the number of machines while doubling performance, and achieving 1 ms‑level response times.
Additional improvements included strengthening foundational support (API aggregation, data orchestration, traffic tagging and sharding) and introducing a unit‑based gateway solution to simplify multi‑region deployments.
2. High‑availability, high‑performance design – The legacy system suffered from low throughput (≈1.2k QPS) and instability under network jitter. The redesign focused on full async processing using Tomcat NIO, Async Servlet, and AsyncHttpClient, reaching 600k QPS with ~1 ms latency.
Reactive programming was explored with Vert.x and Spring WebFlux. By converting synchronous calls to asynchronous ones and using Netty with Reactor, QPS increased threefold and response time dropped by 30 % (≈22 ms for complex workflows).
API aggregation and data orchestration were introduced to handle over 500 APIs and 400 data items, enabling customization and reuse while reducing coupling.
3. Unit‑based gateway and routing – Amap built a unit‑based routing mechanism to support multi‑region active‑active scenarios. Two routing strategies are offered: a routing‑table‑based approach for latency‑critical traffic and a modulo‑based approach for write‑intensive multi‑unit workloads.
The routing calculation core uses BloomFilter, BitMap, and MapDB, selecting BloomFilter for its low false‑positive rate and acceptable cross‑unit routing impact. Virtual group optimization reduces routing calculations from seven to three per request and enables gray‑scale traffic shifting.
Current use cases (cloud sync, user services) show sub‑2 ms routing latency and less than 3 % cross‑unit traffic.
Future planning – Moving from a centralized gateway to a distributed model can be achieved via SDKs, sidecars, or a full service‑mesh. Sidecar deployment offers isolation and scalability, while the Service Mesh (based on Ant SOFA) provides decentralized high‑availability, horizontal scaling, and solves heterogeneous RPC challenges.
The article concludes with practical advice: when facing a need to halve machine count while doubling performance, a full‑link asynchronous architecture is often the most effective solution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
