Cloud Native 16 min read

How Tencent’s TGW Delivers 3× Faster Throughput and Near‑Zero Downtime at Scale

The USENIX‑selected paper on Tencent’s TGW cloud gateway reveals how a modular, multi‑layer architecture achieves up to 2.9‑fold throughput gains, seconds‑level elastic scaling, loss‑less hot migration, and sub‑second fault recovery, offering a blueprint for resilient large‑scale cloud networking.

Tencent Technical Engineering

May 19, 2025

How Tencent’s TGW Delivers 3× Faster Throughput and Near‑Zero Downtime at Scale

The paper "TGW: Operating an Efficient and Resilient Cloud Gateway at Scale" (selected for USENIX ATC ’25) is co‑authored by Tencent’s gateway team and researchers from Tsinghua University and Renmin University. It systematically describes the TGW architecture that has been running in production for eight years, highlighting its ultra‑high performance forwarding, seconds‑level elastic scaling, and robust fault‑tolerance mechanisms.

Background and Goals

Large‑scale cloud data centers serve as the backbone of the Internet. As the public‑facing entry point for diverse workloads—online games, live streaming, finance, etc.—the gateway must handle exponential traffic growth while meeting strict latency and reliability requirements. TGW aims to provide:

Ultra‑high‑performance forwarding (up to 2.9× traditional solutions).

Second‑level elastic scaling with lossless migration.

100% availability with sub‑microsecond packet loss rates (10⁻⁷–10⁻⁴).

Precise fault detection and rapid recovery.

Architecture Overview

TGW follows a hierarchical modular design divided into three logical parts:

Forwarding Plane : Stateless TGW‑EIP (elastic public access) and stateful TGW‑CLB (cloud load balancer).

Control Plane : Global orchestrator, local operator, and distributed load distributor (LD).

Auxiliary Components : BGP + ECMP routing, probe system for fault detection, and log aggregation agents.

Deployment places TGW‑EIP at the region entry point and TGW‑CLB inside each availability zone (AZ). Inbound traffic flow:

Traffic enters via BGP, reaches TGW‑EIP clusters, undergoes NAT and tunnel encapsulation, then is handed to backend servers or TGW‑CLB.

TGW‑CLB distributes traffic based on service identifiers (IP 5‑tuple or QUIC connection ID) using a pipeline + RTC hybrid model, ensuring flow affinity via odd‑even routing.

Key Technical Innovations

Efficient Forwarding Plane

Two specialized forwarding models are employed:

TGW‑EIP uses a Run‑to‑Completion (RTC) model with optimizations such as single‑core batch processing, hash‑lookup prefetch, and conflict handling. These yield a 53% throughput increase and stable latency of 66–105 µs.

TGW‑CLB adopts a Pipeline + RTC hybrid model, dynamic dispatch based on service identifiers, lock‑free ring buffers, and a 1:2 dispatch‑to‑process ratio. Performance reaches 2.9× that of the Tripod baseline for 512‑byte packets.

State Migration Mechanism

TGW supports lossless hot migration that completes within 4 seconds, copying both configuration (VIP‑DIP mappings) and active connection state. The migration workflow includes:

State replication of stateless rules followed by dynamic connection state.

90% of state transferred before the new cluster announces BGP routes.

Proxying of unrecognized traffic from the new cluster to old forwarding nodes.

Fast convergence: backend switches learn the new source IP and complete reverse‑flow migration within 4 seconds.

VIP‑granular migration avoids per‑connection moves, handling up to 240 M connections per node.

Independent migration threads decouple migration from data‑plane processing.

Fault Recovery Mechanism

Multi‑level fault‑tolerance is realized through:

Active‑Active design within an AZ, sharing link tables among forwarding nodes.

Active‑Standby across AZs, with BGP prefix switching on failure.

DNS redirection for cross‑region disasters.

Cluster‑internal link synchronization that filters short‑lived connections (>3 s) and batches export when MTU or timeout thresholds are met, achieving 130 M synchronized connections per node and 350 Mbps peak bandwidth.

Fault Detection and Localization

A color‑mark probing system can locate failures within one minute. It sends TCP half‑handshake probes every 5 seconds, marking trace points (TP) for path tracking and drop points (DP) for drop‑reason logging. Example cases show TP count drops on node crashes and DP spikes on jitter, enabling rapid diagnosis.

Operational Experience

Eight years of production have yielded best‑practice insights across five dimensions:

Blast‑Radius Isolation : Hierarchical design isolates failures at region, AZ, cluster, rack, machine, and link levels. Redundancy follows a 50% principle, ensuring service continuity despite half‑level failures.

Redundancy Strategy : Active‑Active within AZs, hot‑standby (50% extra nodes) for seconds‑level failover, and cold‑standby pools for non‑critical workloads.

Cluster Management : Static hardware/software checks, progressive traffic ramp‑up, auto‑scaling triggers (CPU > 70% or connection thresholds), graceful shutdown (BGP withdraw before resource release), gray‑release (5% nodes first), and one‑minute rollback.

Protocol Optimization : Migration from multicast to UDP unicast improves reliability to 99.999% with a 15% bandwidth cost; odd‑even routing preserves flow affinity.

Security : Layered DDoS cleaning (edge filtering, TGW rate limiting per VIP), dynamic isolation moving attacked VIPs to dedicated cleaning clusters, blacklist learning, and protocol compliance checks (dropping malformed GRE or QUIC frames).

Conclusion and Outlook

The paper provides a comprehensive view of TGW’s architecture, performance optimizations, fault‑tolerance mechanisms, and operational practices, offering reusable references for future cloud gateway designs. Upcoming work will incorporate hardware offload and programmable forwarding to push performance and reliability even further, supporting the next generation of intelligent network infrastructure.