Cloud Native 13 min read

Building a Container Platform at Ximalaya: Practices, Principles, and Evolution

The article chronicles Ximalaya's journey from early Docker-based Java project templates to a mature Kubernetes-driven container platform, detailing development principles, health‑check strategies, deployment workflows, middleware integration, and lessons learned about scaling, automation, and collaborative engineering.

Top Architect
Top Architect
Top Architect
Building a Container Platform at Ximalaya: Practices, Principles, and Evolution

The author, a senior architect, recounts Ximalaya's containerization journey that began in 2016 with Java projects, Docker templates, and Jenkins, eventually evolving into a full‑stack Kubernetes environment.

Key principles were established: developers should not write Dockerfiles or need container knowledge; containers must have seamless IP connectivity across environments; three K8s clusters (Test, UAT, Production) are managed with environment‑specific configurations; and failed startups should preserve state for debugging.

Migration from Marathon to Kubernetes introduced a custom Docker release system, the barge CLI (leveraging Google Jib to build images without Docker on developers' machines), and a naming story linking Harbor and barge. Integration with the company's release platform further abstracted physical‑machine differences.

Initially, containers ran multiple processes (ssh, Nile) managed by runit to satisfy developer familiarity, but later shifted toward the one‑process‑per‑container model as Kubernetes matured.

Service discovery evolved from a Nile process registering to ZooKeeper (via an upsync plugin) to using Consul, ensuring Nginx could track changing service IPs.

Health‑check practices underwent several iterations: readiness probes moved from HTTP/TCP to exec and back to HTTP /healthcheck after discovering configuration burdens and rare RPC‑only failures; liveness probes were eventually omitted to avoid disruptive restarts, relying on readiness and external alerts.

Release strategies adopted dual Deployments for blue‑green style rollouts, with discussions of gray deployments and the potential use of OpenKruise's CloneSet CRD to simplify the process.

The k8s-sync component was built to listen to pod lifecycle events, invoke custom Web/RPC service registration/unregistration APIs, and synchronize pod metadata to MySQL, providing zero‑downtime deployments and robust preStop handling.

Reflecting on the experience, the author emphasizes the heavy effort required to integrate containers with existing middleware, the importance of collaborative growth, and the development of auxiliary tools like wrench to streamline debugging and improve developer self‑service.

The article concludes with an invitation for discussion, sharing of interview resources, and promotion of related open‑source projects.

MicroservicesCloudNativeDeploymentKubernetesDevOpsContainerizationHealthCheck
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.