Evolution of Amap's Billion-Scale Traffic Access Layer Services
Sun Wei outlined Amap’s transformation of its traffic access layer—from handling 600,000‑plus QPS with sub‑2 ms latency through a fully asynchronous, stream‑based pipeline and reactive Vert.x/WebFlux experiments, to API aggregation, traffic tagging, and a roadmap toward distributed sidecar or SDK gateways for billion‑scale, low‑latency services.
At the 2019 Hangzhou Yunqi Conference, the Amap (Gaode) map technology team shared hot topics across multiple travel‑technology fields, including visual and machine intelligence, route planning, fine‑grained spatio‑temporal positioning data, and the evolution of a billion‑scale traffic architecture.
The following is a concise transcript of Sun Wei’s presentation titled “The Evolution Path of Amap’s Billion‑Scale Traffic Access Layer Services” , which discusses the architectural design and future planning of the access‑layer services that support Amap’s rapid business growth.
Sun Wei covered three main areas:
Access‑layer considerations and challenges
High‑availability and high‑performance architectural design
Amap’s server‑side thinking and roadmap
1. Access‑layer Considerations and Challenges
Amap’s Gateway sits between the application layer and various engines (driving, walking, etc.). It currently serves over 80 applications, exposing more than 500 APIs with peak QPS exceeding 600,000. The gateway must remain stable while improving efficiency and empowering business.
The main challenge is handling hundreds of billions of daily requests with latency‑sensitive scenarios (e.g., location service must respond within 5 ms).
To address this, Amap performed a large‑scale architectural upgrade:
Stream‑based, fully asynchronous transformation, halving the number of machines and doubling performance.
Strengthening foundational support by aggregating interfaces, orchestrating data, and tagging/partitioning traffic.
Introducing a unit‑based gateway solution to simplify unitization for other services.
2. High‑Availability and High‑Performance Architectural Design
Before refactoring, the service suffered low performance (≈1,200 QPS per BC server) and high stability risk due to network jitter. The key improvement was full asynchronous processing.
The access‑layer evolution went through three stages, illustrated below:
Stage 1: Asynchronous + Pipeline Refactor
Implemented a stream‑based, fully asynchronous architecture using Tomcat NIO, Async Servlet, and AsyncHttpClient. The gateway achieved peak QPS of 600,000 with response times around 1 ms.
The pipeline architecture introduced extension points at critical upstream and downstream nodes, solving historical interface baggage and avoiding chain‑wide blocking.
Result: single‑machine performance increased by 400 %, with latency below 2 ms (typically ~1 ms).
Stage 2: Reactive Programming Exploration – Vert.x & WebFlux
Adopted Vert.x for reactive I/O tasks and data orchestration, while retaining AsyncHttpClient for HTTP calls. This yielded ~50,000 QPS with ~22 ms response time.
In complex scenarios (e.g., a ride‑hailing flow invoking up to 27 downstream services), WebFlux proved more suitable, enabling full reactive programming with Netty and Reactor, eliminating thread blocking and maximizing CPU utilization.
Outcome: QPS increased threefold and response time decreased by 30 %.
Stage 3: API Aggregation, Data Orchestration & Tagging
With over 500 APIs and 400+ data fields, Amap introduced API aggregation and data orchestration to support customization and reuse.
Tagging and traffic splitting help mitigate risks during service upgrades and model tuning. The current unit‑based gateway supports both routing‑table and modulo‑based strategies, achieving sub‑2 ms routing latency and less than 3 % cross‑unit routing.
3. Thoughts and Planning
Future work focuses on transforming the centralized gateway into a distributed solution. Two implementation paths are considered:
SDK‑based distributed gateway (already handling hundreds of billions of daily requests, but with limited heterogeneity support and isolation).
Sidecar or service‑mesh approach (offers better isolation and heterogeneity handling). Amap is experimenting with a sidecar model managed by a Gateway Control Manager, built on Ant SOFA, to address cross‑service RPC challenges.
Recommendation: For services facing the challenge of halving machine count while doubling performance, a fully asynchronous, end‑to‑end pipeline architecture can be highly beneficial.
Amap Tech
Official Amap technology account showcasing all of Amap's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.