High‑Performance Layer‑4 Software Load Balancer TDLB Based on DPVS and DPDK
The article describes how Trip.com built a high‑performance, software‑based layer‑4 load balancer (TDLB) using DPVS and DPDK, detailing its lock‑free session handling, user‑IP pass‑through, asynchronous logging, cluster session synchronization, resource isolation, configuration management via etcd/operator, health‑check strategies, and multi‑dimensional monitoring.
Introduction
Trip.com’s traffic ingress architecture combines layer‑4 and layer‑7 load balancing, but the hardware‑based layer‑4 solution suffered from high cost, long procurement cycles, and limited HA capabilities. To meet rapid business growth, the team sought an open‑source, high‑performance software alternative.
TDLB High‑Performance Implementation
DPDK Integration
DPVS, an open‑source virtual server project, is combined with DPDK to bypass the kernel, allowing user‑space packet polling, reducing context‑switch overhead and improving cache hit rates.
Lock‑Free Session Design
TDLB adopts a full‑NAT mode with per‑core session tables, ensuring that both inbound and outbound traffic for a flow are processed by the same core, eliminating inter‑core lock contention.
User Source IP Pass‑Through
Both TOA (TCP Option Address) and ProxyProtocol are supported, enabling backend services to obtain the original client IP without requiring kernel modules.
Asynchronous Log Writing
Log messages are queued per core and written by a dedicated logging core, avoiding I/O lock contention that could disrupt packet processing or BGP sessions.
Cluster Session Synchronization
In multi‑active mode, session information is synchronized across cores and nodes. Per‑core internal IPs (SNAT IPs) and FDIR are used so that return traffic is steered to the same core that originated the session, preserving flow affinity.
Two synchronization types are provided: incremental sync for new connections and full sync when a new server joins the cluster.
Resource Isolation
RSS and FDIR distribute packets to specific cores, isolating data paths. NUMA‑aware allocation ensures that each core uses local NIC resources, avoiding cross‑NUMA traffic.
Control‑plane traffic (BGP, health checks) is isolated from data‑plane traffic by assigning it to a dedicated queue.
Cluster Configuration Management
Configuration is stored in etcd. Each TDLB instance runs an operator that watches etcd keys, applies changes, and writes back version information, guaranteeing consistent configuration across the cluster.
Health‑Check Strategy
Health checks are performed on every NIC; failures on one NIC affect only the services bound to that NIC, improving fault tolerance.
Multi‑Dimensional Monitoring
Metrics are collected per‑cluster, per‑server, per‑service, and per‑core using DPDK latency stats, and are exported to Prometheus/Grafana with alerts for rapid fault localization.
Conclusion
TDLB, built on DPVS and DPDK, has operated stably for nearly two years, supporting Trip.com’s services with lower cost, higher performance, and seamless integration into the private cloud, demonstrating the value of adopting open‑source solutions.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.