How I Built an Enterprise‑Grade Kubernetes Cluster Architecture from Scratch
This article recounts a year‑long journey of designing, implementing, and operating a multi‑environment Kubernetes architecture—including containerized workloads, unified logging, CI/CD pipelines, service governance with Istio, and private deployments—while sharing practical lessons and best‑practice recommendations for cloud‑native teams.
Preface
IT is a training ground; after graduating in May 2020 I joined my first company under the mentorship of the CTO (an Alibaba Cloud MVP) and began a personal journey of learning and applying cloud‑native technologies.
From September 2020 I entered phases of exploration, practice, and insight, focusing on Kubernetes since first encountering it in August 2018.
Key Aspects of an Enterprise‑Grade Kubernetes Cluster Architecture
The architecture defines three environments: production, pre‑release, and testing, and adds boundary services such as a unified log management platform, monitoring and alerting, tracing, a unified management console, automatic certificate renewal, and traffic control.
Rearchitecting the Cluster and Full‑Scale Containerization
This "from zero to one" process involved:
Designing a containerization plan based on existing business.
Adding a Jumpserver bastion host.
Creating front‑end and back‑end service images.
Deploying separate Kubernetes clusters for testing, pre‑release, and refactoring the production cluster.
Implementing multi‑cluster CI/CD with GitLab‑Runner, GitLab, and Kustomize.
Defining log fields and output formats jointly with colleagues.
Assisting the back‑end team to fine‑tune legacy services.
Using Rancher for unified multi‑cluster management.
Automating certificate issuance and renewal with Cert‑Manager.
Writing shell scripts to check GitLab backups, bare‑metal service backups, and certificate expirations.
Unified Log Management Platform
The platform consolidates logs from multiple Kubernetes clusters into a single Elasticsearch‑Kibana‑Logstash‑Kafka stack, with Filebeat, Metricbeat, and kube‑state‑metrics deployed per cluster. Logs are output in JSON format, namespaces are unique across clusters, and multi‑line logs are prohibited.
CI/CD
GitLab‑Runner is used for automated deployments. The workflow is: developers push code to environment‑specific branches → image build on a designated pre‑release node → deployment according to .gitlab‑ci.yml rules.
Environment separation via branch naming.
Image builds run on a single pre‑release node to avoid production impact.
Reusable scripts and variables increase the repeatability of Kubernetes manifests.
Service Governance
With increasing micro‑service adoption, we adopted Istio and Kong for traffic management, health checks, connection pooling, circuit breaking, retry, rate limiting, and tracing. EnvoyFilter and Lua scripts were used to integrate authentication services.
Private Deployment
For a 3D editor product with strict data confidentiality, several private‑cloud deployments were performed. Lessons learned include understanding customer‑specific service requirements, estimating resource needs, communicating technical details to non‑technical stakeholders, planning timelines, and coordinating with back‑end teams for configuration issues.
Conclusion
IT remains a training ground; after nearly a year I have progressed through entry, exploration, practice, and insight phases, continuously learning about cloud‑native technologies.
Outlook
Deepening understanding of Kubernetes and cloud‑native ecosystems.
Participating in open‑source contributions.
Continuing private‑deployment projects with higher data security demands.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.