Backend Development 11 min read

Evolution of Amap's Billion-Scale Traffic Access Layer Services

Sun Wei outlined Amap’s transformation of its traffic access layer—from handling 600,000‑plus QPS with sub‑2 ms latency through a fully asynchronous, stream‑based pipeline and reactive Vert.x/WebFlux experiments, to API aggregation, traffic tagging, and a roadmap toward distributed sidecar or SDK gateways for billion‑scale, low‑latency services.

Amap Tech

Oct 31, 2019

Evolution of Amap's Billion-Scale Traffic Access Layer Services

At the 2019 Hangzhou Yunqi Conference, the Amap (Gaode) map technology team shared hot topics across multiple travel‑technology fields, including visual and machine intelligence, route planning, fine‑grained spatio‑temporal positioning data, and the evolution of a billion‑scale traffic architecture.

The following is a concise transcript of Sun Wei’s presentation titled “The Evolution Path of Amap’s Billion‑Scale Traffic Access Layer Services” , which discusses the architectural design and future planning of the access‑layer services that support Amap’s rapid business growth.

Sun Wei covered three main areas:

Access‑layer considerations and challenges

High‑availability and high‑performance architectural design

Amap’s server‑side thinking and roadmap

1. Access‑layer Considerations and Challenges

Amap’s Gateway sits between the application layer and various engines (driving, walking, etc.). It currently serves over 80 applications, exposing more than 500 APIs with peak QPS exceeding 600,000. The gateway must remain stable while improving efficiency and empowering business.

The main challenge is handling hundreds of billions of daily requests with latency‑sensitive scenarios (e.g., location service must respond within 5 ms).

To address this, Amap performed a large‑scale architectural upgrade:

Stream‑based, fully asynchronous transformation, halving the number of machines and doubling performance.

Strengthening foundational support by aggregating interfaces, orchestrating data, and tagging/partitioning traffic.

Introducing a unit‑based gateway solution to simplify unitization for other services.

2. High‑Availability and High‑Performance Architectural Design

Before refactoring, the service suffered low performance (≈1,200 QPS per BC server) and high stability risk due to network jitter. The key improvement was full asynchronous processing.

The access‑layer evolution went through three stages, illustrated below:

Stage 1: Asynchronous + Pipeline Refactor

Implemented a stream‑based, fully asynchronous architecture using Tomcat NIO, Async Servlet, and AsyncHttpClient. The gateway achieved peak QPS of 600,000 with response times around 1 ms.

The pipeline architecture introduced extension points at critical upstream and downstream nodes, solving historical interface baggage and avoiding chain‑wide blocking.

Result: single‑machine performance increased by 400 %, with latency below 2 ms (typically ~1 ms).

Stage 2: Reactive Programming Exploration – Vert.x & WebFlux

Adopted Vert.x for reactive I/O tasks and data orchestration, while retaining AsyncHttpClient for HTTP calls. This yielded ~50,000 QPS with ~22 ms response time.

In complex scenarios (e.g., a ride‑hailing flow invoking up to 27 downstream services), WebFlux proved more suitable, enabling full reactive programming with Netty and Reactor, eliminating thread blocking and maximizing CPU utilization.

Outcome: QPS increased threefold and response time decreased by 30 %.

Stage 3: API Aggregation, Data Orchestration & Tagging

With over 500 APIs and 400+ data fields, Amap introduced API aggregation and data orchestration to support customization and reuse.

Tagging and traffic splitting help mitigate risks during service upgrades and model tuning. The current unit‑based gateway supports both routing‑table and modulo‑based strategies, achieving sub‑2 ms routing latency and less than 3 % cross‑unit routing.

3. Thoughts and Planning

Future work focuses on transforming the centralized gateway into a distributed solution. Two implementation paths are considered:

SDK‑based distributed gateway (already handling hundreds of billions of daily requests, but with limited heterogeneity support and isolation).

Sidecar or service‑mesh approach (offers better isolation and heterogeneity handling). Amap is experimenting with a sidecar model managed by a Gateway Control Manager, built on Ant SOFA, to address cross‑service RPC challenges.

Recommendation: For services facing the challenge of halving machine count while doubling performance, a fully asynchronous, end‑to‑end pipeline architecture can be highly beneficial.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems High Availability service mesh gateway Asynchronous Architecture traffic scaling

Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.