Artificial Intelligence 22 min read

Implementing a Cloud-Native Istio Gateway for 58.com Deep Learning Inference Platform

This article details the evolution of 58.com’s deep learning inference platform, describing the transition from the original SCF‑based architecture to a cloud‑native Istio gateway (architecture 2.0), and explains design choices, traffic‑management, adaptive rate‑limiting, observability, model pre‑warming, and performance improvements.

58 Tech

Dec 22, 2022

Implementing a Cloud-Native Istio Gateway for 58.com Deep Learning Inference Platform

The 15th China System Architect Conference (SACC2022) featured a talk by Wei Zhubin on the deep learning inference platform built by 58.com’s AI Lab, focusing on its cloud‑native gateway implementation using Istio.

Background : The platform, named WPAI, provides a unified GPU/CPU resource pool for offline training and online inference, supporting automatic scaling, mixed deployment, and a suite of algorithm application services (NLP, vision, ranking, etc.). Early on, inference services were built on a traditional SCF‑based gateway (Architecture 1.0).

Architecture 1.0 : SCF acted as the API gateway, converting HTTP requests to gRPC for model services. It offered service registration via Kubernetes watch, configuration sync via WConfig, and protocol‑conversion plugins packaged as JARs. While it solved the lack of a platform‑wide inference service, it suffered from complex onboarding, performance overhead from protocol conversion, memory pressure in Netty buffers, and tight coupling with third‑party libraries.

Motivation for Upgrade : Growing traffic, increasing model count (1000+ models, 4000+ nodes), and the need for better scalability and observability prompted a redesign. The team chose Istio as the cloud‑native gateway to address these shortcomings.

Architecture 2.0 : The new design separates three layers – model service, gateway, and business application. Istio’s control plane (Istiod) provides service discovery, traffic management, and security, while the data‑plane Ingress Gateway (Envoy) handles request routing, load balancing, and rate limiting. Sidecar injection was deliberately avoided for inference workloads to eliminate extra latency and resource overhead.

Traffic‑Management Enhancements : Istio’s native traffic‑management features (routing, load‑balancing, fault injection, etc.) replace the custom SCF logic. Multi‑tenant isolation is achieved by deploying dedicated Gateways per namespace where needed. Adaptive rate limiting is implemented via EnvoyFilters that adjust token‑bucket limits based on real‑time replica counts observed from the platform’s monitoring system.

Model Pre‑warming : To avoid cold‑start latency, the platform leverages Kubernetes Startup and Readiness probes. The Startup probe delays service readiness until the model is loaded, while the Readiness probe controls when the service becomes publicly reachable. Model‑specific pre‑warm clients are generated from configuration files, enabling automatic warm‑up traffic before a node receives real requests.

Observability : Instead of relying on Istio’s sidecar metrics, the team collects structured JSON access logs from the Envoy gateway and unstructured logs from inference services, forwarding them to Kafka and processing them with Flink. Metrics, logs, and traces are visualized in Grafana and Elasticsearch, providing fine‑grained monitoring at department, task, and replica levels.

Results : The 2.0 architecture reduced inference latency by over 50%, improved stability through resource isolation, and simplified traffic governance with Istio’s built‑in capabilities. The platform now supports over 1000 models with peak traffic of billions of requests per day.

Conclusion : By adopting a cloud‑native Istio gateway, 58.com’s inference platform achieved significant performance, scalability, and observability gains, and the team plans to continue leveraging emerging Kubernetes and Istio features to further enhance the system.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native AI deep learning kubernetes Istio Traffic Management Inference Platform

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.