Design and Implementation of Nickel: A High‑Performance Edge Four‑Layer Load Balancer Based on XDP
Nickel is a high‑performance, XDP‑based four‑layer edge load balancer that replaces Bilibili’s non‑clustered CDN architecture with a co‑deployed control plane and data plane, using hash‑based forwarding, dynamic health‑checks, and two‑hop routing to lower cost, improve load distribution, and eliminate dedicated load‑balancer machines.
Background
Bilibili's CDN edge nodes were previously built on a non‑clustered architecture, which caused complex scheduling logic, uneven load across machines in the same data center, exposure of many public IPs, and coarse-grained traffic‑splitting for gray releases. Traditional L4 load balancers such as SLB, LVS, and DPVS could meet functional requirements but required dedicated machines, increasing cost and wasting resources.
To address these issues, the team investigated Cloudflare's XDP‑based high‑performance L4 load balancer Unimog and built a Bilibili‑specific solution called Nickel (Ni), which can be co‑deployed with backend services on the same machines.
Key benefits of Nickel include lower cost (no dedicated machines), lightweight functionality, and the ability to tailor optimizations to Bilibili's traffic patterns.
Architecture Design
Nickel consists of a control plane and a data plane.
2.1 Overall Design
The control plane handles service discovery, configuration management, data reporting, and dynamic maintenance of LB rules. The data plane, built on XDP and Linux Traffic Control (TC), forwards packets according to a hash‑based forwarding table.
2.2 Control Plane
The control plane is based on the open‑source kglb framework and has been extended for edge‑specific requirements. It maintains a forwarding table that maps a service’s virtual IP (VIP) and port to a set of destination IPs (DIP). It also provides health‑check configuration, resource‑usage monitoring, and dynamic load‑balancing based on CPU or QPS.
Example HTTP health‑check configuration (kept verbatim):
{
"checker": {
"http": {
"scheme": "http",
"uri": "/",
"check_port": 9080,
"codes": [200]
}
},
"fall_count": 2,
"interval_ms": 2000,
"rise_count": 2
}The control plane periodically collects server resource usage, calculates per‑machine load, and adjusts traffic distribution accordingly. It also supports weight‑based balancing using a hash of the four‑tuple (src IP, src port, dst IP, dst port) and can dynamically switch between two hops (first‑hop and second‑hop) when the forwarding table changes.
Two‑hop logic ensures that when a packet is redirected to a new DIP, existing TCP connections are not broken: if the first‑hop server does not own the socket, the packet is forwarded to the second‑hop server, which may then forward it back if necessary.
2.3 Data Plane
The data plane uses XDP, which runs at the earliest point in the Linux network stack, bypassing the kernel protocol stack for maximum performance. Three XDP execution modes exist (Offload, Native, Generic); Nickel runs in Native mode to balance performance and hardware compatibility.
The forwarding table is stored as a map‑in‑map eBPF map. The outer map key is the service’s IP/port pair; the inner map key is the hash bucket index, and the value contains a C struct with the first‑hop and second‑hop IPs. When a packet arrives, XDP looks up the bucket, wraps the packet with a GUE header containing the two hop IPs, and passes it to TC.
TC then checks whether a socket for the connection already exists. If it does, the packet is delivered to the application; otherwise, TC forwards the packet to the appropriate hop based on the GUE header.
Because the current edge clusters do not support true VIPs, Nickel uses netfilter conntrack to perform SNAT on ingress packets and DNAT on egress packets. Although conntrack adds some overhead, it is acceptable for the current workload.
Application Scenarios
3.1 Dynamic Acceleration
Nickel aggregates all machines in a data center into a single logical cluster, providing a converged traffic entry point. Health checks automatically drain unhealthy services, and CPU‑based load metrics dynamically adjust traffic shares, reducing manual routing complexity.
3.2 Point‑Live CDN Cluster
Deploying Nickel in point‑live CDN clusters reduces dispatch latency from minutes to seconds, improves load distribution, and balances CPU utilization across machines. Real‑world measurements show a more even QPS distribution after deployment.
Future Outlook
Planned enhancements include black‑/white‑list filtering via XDP, support for RFC‑defined QUIC‑LB rules, and true VIP‑based DSR mode to eliminate conntrack bottlenecks.
References
[1] https://blog.cloudflare.com/unimog-cloudflares-edge-load-balancer/ [2] https://github.com/dropbox/kglb [3] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-olteanu.pdf [4] https://mp.weixin.qq.com/s/uPHVo-4rGZNvPXLKHPq9QQ [5] https://github.com/github/glb-director/blob/master/docs/development/gue-header.md
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.