How Tencent’s HARP Protocol Powers 10,000‑Node Networks with Zero Downtime
Tencent Cloud’s self‑developed HARP protocol delivers ultra‑high‑performance, highly available and scalable data‑center networking, supporting over 10,000 nodes, microsecond‑level fault recovery, and up to 200 Gbps throughput, making it ideal for AI training, storage and big‑data workloads.
What is HARP
HARP is a self‑developed high‑performance transport protocol for data‑center networks. It guarantees reliable end‑to‑end delivery while providing high availability, scalability and performance for upper‑layer applications.
Why Tencent Cloud built a new protocol
Existing protocols (TCP/IP, RoCE) cannot meet the growing reliability and bandwidth demands of large‑scale data‑center workloads. Switch failures and increasing latency‑sensitive traffic require a solution that maintains zero‑downtime and supports massive node counts.
Key advantages (“three highs”)
High availability : Multi‑path parallel transmission and real‑time link detection enable microsecond‑level failover, making network faults invisible to applications.
High scalability : Shared‑connection design reduces hardware resource consumption, easily supporting over 10,000 nodes without performance degradation.
High performance : A custom congestion‑control algorithm (PEAD) delivers up to 200 Gbps, reduces median message completion time by 35 % and cuts 99 %‑tile queuing delay by 90 % compared with TCP.
Technical implementation
HARP offers configurable shared connections (bare‑metal, VM‑level, host‑level) to reduce connection count. It separates hardware‑transaction and software‑transaction layers, letting hardware handle fast reliable transmission while software provides flexible message processing.
Deterministic multipath transmission with microsecond‑level path switching ensures zero‑chain break; path failures are detected and a new path is selected within 100 µs, achieving 99.9 % faster recovery than TCP’s typical 1 s reconnection.
Use cases
Initially designed for storage and high‑performance computing, HARP is now used in Tencent Cloud CBS and Elastic RDMA services, and can benefit AI training, key‑value stores, and distributed big‑data applications that demand low latency and high throughput.
Tencent Tech
Tencent's official tech account. Delivering quality technical content to serve developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.