Cloud Native 9 min read

How I Built an Enterprise‑Grade Kubernetes Cluster Architecture from Scratch

This article recounts a year‑long journey of designing, implementing, and operating a multi‑environment Kubernetes architecture—including containerized workloads, unified logging, CI/CD pipelines, service governance with Istio, and private deployments—while sharing practical lessons and best‑practice recommendations for cloud‑native teams.

Efficient Ops

Dec 13, 2021

How I Built an Enterprise‑Grade Kubernetes Cluster Architecture from Scratch

Preface

IT is a training ground; after graduating in May 2020 I joined my first company under the mentorship of the CTO (an Alibaba Cloud MVP) and began a personal journey of learning and applying cloud‑native technologies.

From September 2020 I entered phases of exploration, practice, and insight, focusing on Kubernetes since first encountering it in August 2018.

Key Aspects of an Enterprise‑Grade Kubernetes Cluster Architecture

The architecture defines three environments: production, pre‑release, and testing, and adds boundary services such as a unified log management platform, monitoring and alerting, tracing, a unified management console, automatic certificate renewal, and traffic control.

Rearchitecting the Cluster and Full‑Scale Containerization

This "from zero to one" process involved:

Designing a containerization plan based on existing business.

Adding a Jumpserver bastion host.

Creating front‑end and back‑end service images.

Deploying separate Kubernetes clusters for testing, pre‑release, and refactoring the production cluster.

Implementing multi‑cluster CI/CD with GitLab‑Runner, GitLab, and Kustomize.

Defining log fields and output formats jointly with colleagues.

Assisting the back‑end team to fine‑tune legacy services.

Using Rancher for unified multi‑cluster management.

Automating certificate issuance and renewal with Cert‑Manager.

Writing shell scripts to check GitLab backups, bare‑metal service backups, and certificate expirations.

Unified Log Management Platform

The platform consolidates logs from multiple Kubernetes clusters into a single Elasticsearch‑Kibana‑Logstash‑Kafka stack, with Filebeat, Metricbeat, and kube‑state‑metrics deployed per cluster. Logs are output in JSON format, namespaces are unique across clusters, and multi‑line logs are prohibited.

CI/CD

GitLab‑Runner is used for automated deployments. The workflow is: developers push code to environment‑specific branches → image build on a designated pre‑release node → deployment according to .gitlab‑ci.yml rules.

Environment separation via branch naming.

Image builds run on a single pre‑release node to avoid production impact.

Reusable scripts and variables increase the repeatability of Kubernetes manifests.

Service Governance

With increasing micro‑service adoption, we adopted Istio and Kong for traffic management, health checks, connection pooling, circuit breaking, retry, rate limiting, and tracing. EnvoyFilter and Lua scripts were used to integrate authentication services.

Private Deployment

For a 3D editor product with strict data confidentiality, several private‑cloud deployments were performed. Lessons learned include understanding customer‑specific service requirements, estimating resource needs, communicating technical details to non‑technical stakeholders, planning timelines, and coordinating with back‑end teams for configuration issues.

Conclusion

IT remains a training ground; after nearly a year I have progressed through entry, exploration, practice, and insight phases, continuously learning about cloud‑native technologies.

Outlook

Deepening understanding of Kubernetes and cloud‑native ecosystems.

Participating in open‑source contributions.

Continuing private‑deployment projects with higher data security demands.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native CI/CD log management Service Governance private deployment

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.