Cloud Native 12 min read

Insights from the 58 Group Technical Salon: Cloud Platform Architecture Practices by Zhihu and 58

The article summarizes the 58 Group technical salon where Zhihu and 58 teams shared their cloud platform evolution, containerization strategies, multi‑cluster management, network architecture, service discovery, and real‑world case studies, highlighting challenges and solutions for large‑scale Kubernetes deployments.

58 Tech
58 Tech
58 Tech
Insights from the 58 Group Technical Salon: Cloud Platform Architecture Practices by Zhihu and 58

On February 27, 2019, the 58 Group Technical Salon (Session 9 – “Cloud Platform Architecture”) was held at the Beijing headquarters, featuring presentations by Zhihu’s container platform team and 58’s TEG cloud platform team on their container‑cloud practices.

Zhihu Cloud Platform Practice

Zhihu began containerizing production workloads in 2015, using Mesos initially and migrating to Kubernetes by the end of 2017. The platform now runs business containers as well as infrastructure services such as Kafka and HBase.

The service framework relies on Consul for service registration/discovery and HAProxy for load balancing, enabling rate‑limiting and circuit‑breaking.

To overcome Kubernetes cluster size limits, Zhihu supports multi‑cluster management, horizontal scaling, cross‑cluster disaster recovery, and hybrid‑cloud integration for burst traffic.

Case Study – etcd Failure

High event volume in large clusters stressed etcd, causing outages. Solutions included event isolation to a separate etcd cluster, regular cleanup, and upgrading storage to SSDs.

Case Study – Kubernetes Eviction

Node heartbeat loss can trigger massive container migrations. The "unhealthy‑zone‑threshold" parameter limits eviction scope to mitigate impact.

Infrastructure Containerization – Kafka

Kafka was containerized using HostPath storage and a custom LocalPV resource with a disk‑aware scheduler. An API creates Kafka brokers, the scheduler selects appropriate nodes/disks, writes LocalPVPod to etcd, and monitors pod status for fault handling.

Future Outlook (Zhihu)

Further infrastructure containerization and server utilization optimization are planned.

58 Cloud Platform Practice

Started in early 2017 to address low resource utilization, slow scaling, and inconsistent release processes. Built on containers and Kubernetes, the platform now serves over 2,000 services, runs on 430 physical machines, and hosts tens of thousands of containers.

Network Architecture

Adopts a "bridge+VLAN" model with a custom IP controller to provide fixed IPs for services, integrating with Tencent data‑center networking for full‑mesh container routing.

Network Rate Limiting

Implemented bidirectional traffic shaping by applying tc limits to both ends of the veth pair, enabling dynamic, per‑second, and elastic bandwidth control.

Service Discovery

Uses Consul for decoupled service registration and HAProxy load balancing, with a proxy layer watching Kubernetes events to keep IP mappings up‑to‑date, allowing any language service to join without code changes.

Case Study – Load Isolation

Introduced container‑level thread caps and host‑level overload protection to prevent excessive host load from affecting other services.

Case Study – Swap‑Induced Latency

Disabled swap partitions to eliminate random latency spikes in early cloud migration stages.

Future Outlook (58)

Plans include intelligent scheduling, stateful service support, and deeper hybrid‑cloud integration.

Conclusion

The salon facilitated deep technical exchange between Zhihu and 58, revealing common challenges in cloud‑native transformation and sharing targeted solutions that advance large‑scale container cloud adoption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativeMulti-Clustercontainerizationnetwork-architectureservice-discovery
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.