Evolution of 58 Group's Container Cloud Platform: Architecture, Network, and Service Discovery
This article details the two‑year evolution of 58 Group’s container cloud platform, describing its background, overall architecture, network and service‑discovery advancements, operational challenges, and lessons learned from migrating over 90% of the company’s traffic to a Docker‑Kubernetes‑based cloud environment.
Preface
The 58 Cloud Computing Platform (referred to as the 58 Cloud) has been in production for more than two years within the 58 Group. During this period the team combined container characteristics with real business scenarios, deeply developing and evolving the platform architecture and completing the cloudification of the group's online traffic.
This article is based on a presentation by the platform lead at the GITC 2018 conference. It focuses on the platform background, two core components, their technical evolution, and specific cases and solutions encountered during the business cloudification process.
Background
Containerization was adopted to solve three main problems: low resource utilization (aiming for a 3‑4× increase), slow service scaling (reducing scaling time from hours to minutes), and unstandardized release processes (enforcing a unified image template).
The team selected Docker and Kubernetes as the foundation after extensive evaluation.
Overall Architecture
The platform’s overall architecture is illustrated below (image omitted). It supports more than 90% of the group's business traffic and integrates tightly with internal systems for project management, code management, CMDB, monitoring, and service governance.
Four environments share a single image repository, allowing the same code version to flow across environments while preserving code uniqueness.
Architecture Evolution
Network Architecture
The team compared six common container networking models and chose a "bridge+VLAN" base model. A custom Docker CNM plugin was developed to enable shared container subnets across hosts and IP reuse.
To meet the business demand for fixed IP addresses, the network was upgraded in April 2018 using the Kubernetes CNI interface with an IP controller module. This ensures IP stability during scaling and leverages the data‑center network to route container subnets across switches.
Network throttling was implemented using a self‑developed monitoring tool and the Linux tc utility. By applying bidirectional limits on the paired veth / eth0 interfaces, the platform supports dynamic, second‑level, and elastic bandwidth control.
Service Discovery
Service discovery is a core component that connects traffic to services. The platform needed automatic load‑balancer updates as IPs changed. After evaluating Kubernetes’ native discovery and finding it insufficient for complex load‑balancing and node‑drain scenarios, the team adopted Consul with a proxy layer.
The proxy watches Kubernetes events, updates Consul in real time, and performs health checks, allowing any language (Java, PHP, Node.js, Go, etc.) to register services without code changes. This makes registration transparent to developers and reduces security exposure.
Review and Reflection
Key design considerations included choosing containers over VMs, enforcing a single‑process per container policy, and eliminating agents inside containers to reduce overhead on hosts.
Early challenges such as high CPU usage during service startup were mitigated by introducing warm‑up strategies on both the caller and callee sides. Monitoring was enhanced from minute‑level sampling to second‑level container metrics to capture short‑lived spikes.
Additional controls were added: per‑container thread limits, host‑level overload protection that migrates containers when the host load is high, and disabling swap to avoid latency jitter.
The platform’s core software versions evolved over time, as shown in the timeline image below.
Afterword
The 58 Group’s container cloud practice demonstrates extensive technology selection, optimization, and architectural evolution to meet the unique business scenarios of a large internet company. Despite starting later than many startups, the platform successfully cloudified the majority of traffic within a year and continues to address new services and future challenges.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
