Cloud Native 8 min read

Overview of ZuanZuan Cloud Platform: Architecture, Image Management, Release Upgrade, Container Monitoring, and Log Collection

The article introduces the ZuanZuan cloud platform, detailing its overall architecture, image management workflow that abstracts Dockerfiles, release‑upgrade strategies with custom controllers, container monitoring evolution to Prometheus, and log‑collection mechanisms that handle large Java‑based log volumes.

Wukong Talks Architecture
Wukong Talks Architecture
Wukong Talks Architecture
Overview of ZuanZuan Cloud Platform: Architecture, Image Management, Release Upgrade, Container Monitoring, and Log Collection

Overall Architecture

The ZuanZuan cloud platform consists of four main parts: image management, release upgrade, container monitoring, and log collection, providing a unified control plane that abstracts underlying components such as component image storage, logging, Kubernetes, service governance, NG governance, and monitoring, thereby reducing developer friction and learning curve.

Function Flow

A system feedback loop diagram illustrates user behavior and data flow, showing how developers use a CI/CD platform to compile and deploy, query logs via a logging platform, configure log collection through a big‑data platform, and interact with the cloud platform which in turn manages interactions with Kubernetes.

Image Management

The goal is to make Dockerfiles transparent to developers by providing base images per technology stack (e.g., ZZJAVA, ZZWEB, WF) with predefined startup scripts, allowing physical‑machine build artifacts to be reused directly in containers. Compilation is performed in pods on a Kubernetes cluster to achieve distributed builds and break performance bottlenecks.

The workflow: developers compile via CI/CD, push artifacts to an FTP service, the cloud platform creates a compilation pod, builds the Docker image inside the pod using a Docker client, and pushes the resulting image to Harbor.

Release Upgrade

Two main topics are covered: the evolution of ZuanZuan’s release technology (standard Deployment controller and a custom controller) and implementation details for different service types (RPC and Web). Common concerns include CPU over‑commit, readiness checks, affinity strategies, and service‑type‑specific implementations.

CPU Over‑commit

Key nodes are labeled and NodeSelector is used to match appropriate hosts, while CPU request is set as a proportion of the limit to control over‑commit ratios.

Readiness Checks

Web services expose a health endpoint returning 200; RPC services consider a successful port bind as ready, with registration to the service‑governance system.

Affinity Strategy

PodAntiAffinity is employed to spread pods across different hosts, preventing a single host failure from taking down the entire service.

RPC Service

Challenges include node grouping support and service‑governance integration; the solution configures group IDs via environment variables and adapts the RPC framework for grouped discovery.

Web Service

Challenges involve lack of a registration center and automatic Nginx lifecycle; the solution uses container lifecycle hooks to bring Nginx up/down via custom interfaces.

Custom Controller

To address issues like IP drift, log loss, and kubelet subPath limitations, a custom controller rewrites the RC logic, reuses pods, and implements an emptyDir‑based subPath solution, enabling seamless image version replacement and pod recreation on resource changes.

Container Monitoring

The monitoring stack evolved from Heapster to Metrics‑server and finally to Prometheus scraping cAdvisor metrics; a noted gap is the missing IP‑to‑pod mapping in Prometheus data, which requires additional handling.

Log Collection

Java applications typically log to multiple files rather than stdout, generating large volumes and potential loss. The platform mitigates loss by writing logs to hostPath on the host, and handles volume by using an asynchronous agent that watches Docker lifecycle events, generates Flume configurations, and decouples log collection from the cloud platform.

monitoringcloud nativedeploymentKubernetescontainerLog CollectionImage Management
Wukong Talks Architecture
Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.