Backend Development 20 min read

Inside Taobao’s High‑Performance API Gateway: Scaling to Billions of Calls

This article reveals how Taobao Open Platform’s API gateway, message service, and data‑sync system achieve ultra‑high concurrency, low latency, and zero‑loss reliability through multi‑level caching, asynchronous processing, traffic control, and distributed storage techniques.

ITFLY8 Architecture Home

Aug 15, 2021

High‑Performance API Gateway

Taobao Open Platform (open.taobao.com) is Alibaba’s primary communication hub, handling hundreds of billions of API calls, message pushes, and billions of data syncs daily, especially after eight years of Double‑11 traffic growth.

Internal data resides in independent business systems (e.g., product center, transaction platform, user center) and is exchanged via HSF (High‑speed Service Framework). To safely expose this data to external merchants and ISVs, an API gateway was created.

Overall Architecture

The gateway uses a pipeline design to handle business, security, routing, and invocation logic. To meet Double‑11 peak QPS (near one million), several optimizations were applied:

Metadata reads use a rich‑client multi‑level cache with asynchronous refresh, supporting tens of millions of QPS while controlling network congestion.

Thread resources are conserved by offloading remote service calls to HSF or HTTP NIO clients, releasing servlet threads early. Responses are processed asynchronously via an event‑driven model and Jetty Continuation, achieving full async handling.

Multi‑Level Rich‑Client Cache

Metadata (flow control, field levels, categories, app keys, IP whitelists, permission packages, user auth, etc.) can reach tens of millions of QPS during Double‑11. Direct DB hits are infeasible, so a three‑layer cache is used: distributed cache, LRU local cache, and a BloomFilter layer to prevent cache breakdown. Cache rules are dynamically pushed, and expired data may be tolerated temporarily while an async task updates it.

High‑Performance Batch API Calls

ISVs often need to invoke multiple APIs sequentially, causing high RT and network overhead. The gateway provides a batch‑call mode: the TOP SDK merges requests, the gateway splits them, performs asynchronous remote calls, and finally merges and returns results.

Multi‑Dimensional Traffic Control

With hundreds of billions of daily calls, the gateway implements various flow‑control rules (per‑second, daily limits per API, per‑APPKEY, etc.). For APIs with limited capacity (e.g., 200k QPS vs. 400k demand), traffic is divided into groups with configurable ratios, ensuring critical calls receive higher priority.

Both cluster‑wide and single‑machine flow control are used. Cluster control avoids uneven distribution and single‑node failures, while single‑machine control (using Google ConcurrentLinkedHashMap) handles high‑QPS APIs with LRU eviction.

High‑Reliability Message Service

The message service offers a real‑time, reliable, asynchronous bidirectional channel, processing billions of messages daily and supporting millions of concurrent streams.

Overall Architecture

The system consists of routing, storage, and push subsystems. Messages are stored first, then pushed, guaranteeing at‑least‑once delivery. The routing layer filters, authenticates, transforms, and logs events, feeding them to a JStorm analytics cluster.

Push vs. Pull

Push mode provides higher real‑time performance (average latency ~100 ms, max <200 ms) and reduces server load compared to pull polling. The system also supports pull‑to‑push conversion for clients that prefer pulling.

Ensuring Low‑Latency Push

All push machines form a notification network; when any machine detects a new message, it quickly notifies the machine handling the target long‑connection, enabling rapid delivery.

Fast Message Confirmation

Each pushed message starts a transaction; if the client does not acknowledge within a timeout, the message is resent. In public networks, a 5‑second timeout is insufficient, so persistent storage is used. For high‑throughput scenarios, a hybrid in‑memory and disk approach is adopted.

The storage subsystem uses three layers: HeapMemory for the latest 10 seconds, DirectMemory for recent entries, and FileSystem for long‑term persistence, minimizing DB load.

Zero‑Loss Data Sync

Traditional DB replication cannot meet the flexibility and safety needs of complex, high‑traffic external sync scenarios. Taobao’s solution combines a message‑driven real‑time path with periodic reconciliation tasks to guarantee both timeliness and consistency.

Distributed Data Consistency Guarantee

Order‑related messages carry order IDs; short‑term duplicate messages are merged, and the sync client fetches order details to update the user DB. Reconciliation tasks run continuously, assigning tasks to alive clients based on heartbeat and modulo arithmetic to ensure balanced load.

Resource Dynamic Allocation & Isolation

Logical clusters isolate hot‑spot users, assigning dedicated machines to them, which solves DB connection saturation and allows resource skew for high‑priority users during promotions.

General Data Storage Model

Order data is stored with core fields (order ID, seller nickname, update time) extracted, while the full order JSON is kept in a large field. A hashcode of the JSON plus modified time enables fast incremental sync without full content comparison.

Reducing Data Write Overhead

During Double‑11, external DBs become the bottleneck. Instead of a SELECT‑then‑UPDATE pattern, the sync client directly issues an UPDATE with a modified‑time check, cutting DB accesses by ~90%.

SELECT * FROM jdp_tb_trade WHERE tid = #tid#;
UPDATE jdp_tb_trade SET jdp_response = #jdpResponse#, jdp_modified = now() WHERE tid = #tid#;

Optimized version:

UPDATE jdp_tb_trade SET jdp_response = #jdpResponse#, jdp_modified = now() WHERE tid = #tid# AND modified < #modified#;

Logical deletion and off‑peak batch cleanup further improve performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems api-gateway high concurrency traffic control data synchronization

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.