Cloud Native 30 min read

eBay's UFES Project: Cloud‑Native Edge Node Architecture and Migration from Hardware Load Balancers

eBay’s UFES project details the design and migration of its global edge‑node infrastructure from hardware load balancers to a cloud‑native network using Kubernetes, IPVS, Envoy, Contour, and automated PaaS tooling, highlighting challenges, architecture, caching, and future plans.

Cloud Native Technology Community
Cloud Native Technology Community
Cloud Native Technology Community
eBay's UFES Project: Cloud‑Native Edge Node Architecture and Migration from Hardware Load Balancers

Background – Edge nodes (POP points of presence) are cache and load‑balancing devices placed close to users to accelerate access to eBay’s data‑center resources. Traditional edge nodes relied on hardware load balancers (Netscaler) and static VIP configurations.

Cloud‑Native Network – A cloud‑native network runs on Linux containers (Kubernetes) instead of dedicated hardware. eBay’s goal was to give edge nodes the properties of global distribution, extensibility, elasticity, and high security.

UFES (Unified Front End Services) – Initiated in 2017, UFES builds a low‑latency, highly‑available front‑end and edge‑computing platform on a cloud‑native network. It serves three public‑facing traffic domains: Desktop Web, Mobile Web, and Native Apps, handling billions of requests daily.

Traditional Hardware Architecture – Edge traffic entered data centers via Netscaler devices in active‑standby mode, with dozens of VIPs and thousands of L7 rules. This architecture suffered from high cost, lack of elastic scaling, limited global POP count, and operational complexity.

New Edge Node Architecture – The design splits each edge node into a Front Proxy (software load balancer) and a Data‑Center Gateway. The Front Proxy terminates SSL, performs anycast, and hosts lightweight L7 rules. The Gateway provides L7 rule management, visibility, and integrates Envoy for authentication, rate‑limiting, and TLS.

L4 Load Balancing (IPVS) – IPVS is chosen over Netfilter, OVS, and eBPF for maturity. A custom kernel module implements consistent‑hash scheduling, and the control plane generates IPVS service tables via Netlink. Features include horizontal scaling, Direct Server Return, and BGP‑advertised VIPs for ECMP.

L7 Load Balancing (Envoy/Contour) – Envoy handles TLS termination, routing, rate limiting, and security. Contour serves as the control plane, translating custom IngressRoute resources into Envoy configuration. Customizations include a bespoke apiVersion, HTTPMatchRequest support, per‑VIP listeners, weighted endpoints, and integration with eBay’s internal CA.

Caching (UFES Dynamic Content Cache) – Edge nodes cache dynamic page fragments using Apache Traffic Server (ATS) behind Envoy. A side‑car tracks request state, generates ATS keys, and updates cache status, reducing latency by 500‑700 ms for cached content.

PaaS Integration (Zebra) – Zebra provides APIs for creating, deleting, and synchronizing L7 rules across hardware and software load balancers, automating certificate management, and detecting configuration drift. An IngressTemplate (stored in a MongoDB‑backed CMS) abstracts Netscaler DSL rules, which are parsed by an ANTLR4 grammar into JSON.

Traffic Validation – After automated configuration, a traffic validator replays production request traces against the new software load balancer to ensure response‑code parity before cut‑over.

L7 Management Console – A unified UI allows operators to manage both hardware and software load balancers, define release strategies, and perform rollbacks during migration.

Custom Extensions – Contour and Envoy were extended to support prefix rewrite/strip‑query redirects, wildcard default routes, and custom HTTP filters for Drop/Reset actions. These patches have been upstreamed where possible.

Future Outlook – Ongoing work includes enabling anycast for public IPs, migrating data‑center hardware load balancers to an Istio‑based application gateway, and further automation of IngressRoute generation via federation controllers.

Overall, the UFES project demonstrates that a cloud‑native edge network can replace expensive hardware, achieve elastic scaling, improve latency, and provide a foundation for future service‑mesh‑driven architectures.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativeufesload-balancingnetwork-architectureedge-computing
Cloud Native Technology Community
Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.