Why Service Mesh Matters: Lessons from Deploying Istio on UCloud UAEK
This article explains why UCloud introduced a Service Mesh, how Istio was chosen and adapted for the UAEK Kubernetes platform—including IPv6 challenges, performance testing, and future improvements—providing practical insights for building reliable micro‑service infrastructures.
1. Why Service Mesh is Needed
UCloud App Engine on Kubernetes (UAEK) is an internal, highly‑available, multi‑zone, auto‑scaling platform that aims to improve development efficiency and simplify operations. While Kubernetes eases deployment and scaling, micro‑service architectures introduce problems such as service discovery, monitoring, gray‑release control, overload protection, and request tracing.
To address these, UAEK required a Service Mesh that could provide:
Sidecar deployment with zero intrusion, fully decoupling governance code from business logic.
Integrated service discovery and load‑balancing with Kubernetes.
Real‑time, no‑restart traffic routing based on L7 information.
Unified data‑reporting API for monitoring and access control.
Distributed request‑trace system for rapid bug localization.
Overload protection that automatically triggers circuit breaking.
Pre‑deployment fault‑injection scenarios for disaster‑recovery drills.
2. Why Istio?
After research and testing, Istio was selected because it perfectly supports Kubernetes, separates control and data planes, uses a sidecar model, leverages Envoy (a high‑performance C++11 proxy), requires zero code changes, and offers simple configuration with a complete API.
The Service Mesh consists of a data plane (Envoy sidecar injected into each pod) and a control plane (Pilot, Mixer, Citadel). Pilot watches Kubernetes for service information and distributes routing rules; Mixer provides policy enforcement and telemetry; Citadel handles authentication and RBAC.
3. Adapting Istio to the UAEK Environment
UAEK runs in a pure IPv6 network, but early Istio versions lacked full IPv6 support. The team modified the init script to add ip6tables rules, changed Pilot to listen on an IPv6 address, and fixed Envoy’s getsocketopt implementation to correctly retrieve the original destination for IPv6 traffic.
After these patches, Envoy could accept TCP connections, but connections were reset because Envoy listened on an IPv4 address. Further source changes made Envoy listen on [::0]:15000, restoring normal operation.
Additional bugs in Pilot and Mixer related to IPv6 address handling caused array‑out‑of‑bounds crashes and were fixed one by one.
Performance Evaluation
Adding Envoy introduces an extra hop and a Mixer Policy check per request, adding roughly 5 ms latency in UAEK—acceptable for most services. However, Mixer Policy becomes a bottleneck at 2 000–3 000 QPS, where check latency spikes from 2‑3 ms to 100‑150 ms. The team disabled Policy and trimmed features such as global QPS quota.
Mixer Telemetry also suffered memory pressure above 2 000 QPS due to slow backend adapters. Removing the unused stdio logger alleviated the issue, and Istio 1.0’s Telemetry can handle >35 000 QPS.
4. Issues, Hopes and Future
Despite many challenges, a production‑ready Service Mesh is now running on UAEK. Ongoing work includes simplifying Istio onboarding, improving upgrade paths, and continuing to contribute fixes upstream. The team plans to expand to more regions, enhance the console, and add CI/CD automation.
UCloud Tech
UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
