Building a Cloud‑Native Container Platform at Ximalaya: Practices, Principles, and Evolution
This article recounts Ximalaya's journey from a simple Docker‑based project template to a full‑featured Kubernetes‑driven container platform, describing the principles, tools, health‑check strategies, deployment patterns, middleware integration, and lessons learned for building reliable cloud‑native services.
Since the end of 2016, the Ximalaya team created a containerization platform to support primarily Java web and RPC projects, initially using Docker templates on Jenkins and later migrating to Kubernetes, while preserving a strong imprint of the company's internal practices.
The platform follows several core principles: developers do not need to write Dockerfiles or understand container details; test environments allow direct container access with IP connectivity across machines; three Kubernetes clusters (Test, UAT, Production) share configuration and a central deployment database; and failed pods retain their state for debugging instead of endless restarts.
Early on, the release workflow involved Jenkins building Docker images from compiled WAR/JAR files, pushing them to Harbor, and deploying via Marathon. To reduce friction, a custom CLI tool called barge was introduced, allowing developers to deploy with barge deploy and debug with barge exec -m $projectName. The tool leverages Google’s Jib library to build images directly from source code without requiring Docker on developer machines.
Initially, containers ran multiple processes (including an SSH daemon) managed by runit to ease adoption, but as Kubernetes matured the architecture shifted toward the one‑process‑per‑container model.
Health‑check handling evolved through several iterations: starting with a simple HTTP /healthcheck endpoint, adding TCP port checks for RPC services, experimenting with exec‑based probes, and finally settling on readiness probes that only use the HTTP health endpoint, while liveness probes were eventually omitted to avoid unnecessary restarts.
For deployments, the team adopted a dual‑Deployment strategy per project to enable gray‑release: one Deployment scales down while the new Deployment scales up. Alternatives such as OpenKruise’s CloneSet CRD are being evaluated to simplify the process.
Integration with existing middleware is achieved via a component called k8s‑sync, which watches pod status, invokes upstream service registration/unregistration APIs, stores pod metadata in MySQL, and provides a “container cloud platform” UI for developers to query and troubleshoot issues.
Reflecting on the journey, the authors note the heavy effort required to blend containers with legacy systems, the importance of communication and shared learning, and personal growth in both Java and Go concurrency models, concluding that the experience has turned containerization into a core part of Ximalaya’s engineering culture.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
