HARP – Tencent Cloud's High‑Performance, Highly Available Network Transmission Protocol
HARP is Tencent Cloud's high-performance, highly available network transmission protocol that quickly reroutes around switch failures within 100 µs, offering zero packet loss, low latency, high bandwidth, scalable connections, and custom congestion control for storage, HPC, AI, and big data workloads.
In the cloud computing era, switches are the “neural hub” of data centers; a single switch failure can affect upper‑level services. Tencent Cloud’s next‑generation high‑performance network can find a new path within 100 µs, achieving zero‑packet loss.
Tencent Cloud has been developing its own high‑performance network transmission protocol, HARP (Highly Available and Reliable Protocol). HARP is already deployed in block storage services and is the standard capability of Tencent’s custom‑designed Yinsan smart NICs and Xuanling chips.
Challenges in data‑center networks
1) Reliability – even a 0.15 % annual hardware failure rate of switches can cause throughput drops, latency spikes, or complete connection loss for latency‑sensitive applications such as high‑performance storage.
2) Performance bottlenecks – modern workloads demand high bandwidth, ultra‑low and consistent latency, and massive scale (100 Gbps‑400 Gbps). Existing TCP and RoCE v2 stacks either consume excessive CPU or have limited connection‑state resources, leading to congestion‑control and reliability issues.
3) Congestion‑control – TCP‑based solutions perform poorly in data‑center environments, and RoCE v2 suffers from limited concurrent connections and inadequate fault recovery.
Why a new protocol? HARP is designed to provide high reliability, high bandwidth, low latency, and scalability simultaneously.
Key technical innovations of HARP
1) Hardware‑software transaction separation – a layered design separates a lightweight reliable‑transport layer from a flexible software transaction layer, enabling efficient packet‑level reliability while keeping hardware costs low.
2) Granular, configurable shared connections – HARP supports bare‑metal, VM‑level, and host‑level shared connections, drastically reducing the number of connections required in large clusters (e.g., 10 K nodes × 100 processes need only 10 K connections).
3) Self‑developed high‑performance congestion control – the PEAD algorithm leverages ECN to achieve 35 % lower median flow‑completion time, 70 % lower p99 latency, and 90 % lower queueing delay compared with TCP. An alternative DARC algorithm works even without switch ECN support.
4) Deterministic multipath transmission and rapid fault switching – each connection uses multiple non‑overlapping paths with independent congestion detection. When a path fails, HARP detects the fault within hundreds of microseconds and switches to a new path, achieving near‑zero connection‑loss probability compared with TCP.
Advantages
• High availability : Guarantees service continuity for storage workloads even during switch failures.
• High scalability : Supports tens of thousands of nodes with linear performance growth.
• High performance : Provides ultra‑low average and tail latency while fully utilizing 100 Gbps‑400 Gbps bandwidth.
Typical applications
HARP is used in Tencent Cloud Block Storage (CBS), high‑performance computing (HPC), AI training, key‑value stores, distributed big‑data workloads, and VPC networking.
Future outlook
Tencent Cloud is expanding HARP’s ecosystem by integrating it with sockets, IB Verbs, libfabric, UCX, and by offloading the full stack to custom ASICs (Yinsan, Xuanling). The goal is “HARP for everything”, making the protocol a universal high‑performance transport layer for cloud services.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.