Cloud Native 12 min read

From Zero to Production: Building an Enterprise‑Grade Kubernetes Architecture

This article chronicles a recent graduate’s year‑long journey mastering cloud‑native technologies, detailing the design and implementation of a multi‑cluster Kubernetes architecture, unified logging, CI/CD pipelines, service governance with Istio, and private deployment strategies for a 3D editor platform.

Efficient Ops
Efficient Ops
Efficient Ops
From Zero to Production: Building an Enterprise‑Grade Kubernetes Architecture

Preface

IT is a training ground.

After graduating in May 2020, I joined my first company under the mentorship of the CTO (the Alibaba Cloud MVP). Within three months he left, giving me the chance to drive my own technical path.

From September 2020 I entered three phases: "venturing" (loneliness, pain, and hardship), "cultivation" (self‑encouragement), and "realization" (knowledge and mindset). My work has been self‑directed and self‑driven.

In July 2019 I decided to pursue a cloud‑native career, falling in love with Kubernetes since August 2018.

When I started at the company in June 2020, I experienced the chaotic cluster environment of a startup: only the front‑end was Kubernetes‑ized, production resources were scarce, GitLab ran inside Kubernetes, and permissions were messy.

June 2020: built an ELFK‑based logging system that collected only gateway (nginx‑ingress) logs.

July 2020: led the full business Kubernetes migration from zero to one.

August‑September 2020: rebuilt the cluster and CI/CD pipeline, added test and pre‑release environments, switched the gateway from nginx‑ingress to Kong‑ingress, extracted GitLab from Kubernetes, automated certificate issuance with cert‑manager, introduced a bastion host, and enabled multi‑cluster deployments via gitlab‑runner.

October 2020: focused on a three‑dimensional monitoring and alert system, participating in a private‑cloud deployment project.

November 2020: centered on Istio service governance, validating connectivity, security, flow control, and visibility, and developed an EnvoyFilter plugin for authentication.

December 2020‑January 2021: unified logging for services across multiple Kubernetes clusters and bare‑metal servers.

January‑February 2021: migrated the pre‑release Kong‑ingress to Istio and integrated certificate, monitoring, and logging services.

March‑May 2021: worked on private‑cloud deployments, Istio production rollout, and related tasks.

During the first year I created 13 code repositories and wrote over 130 technical documents.

In early June 2020 I drafted an enterprise‑grade Kubernetes cluster architecture with three environments (production, pre‑release, test) and added boundary services such as a unified log platform, monitoring, tracing, management console, automatic certificate renewal, and flow control.

Key Points of the Enterprise‑Grade Kubernetes Architecture

Rebuilding Cluster Architecture and Full Business Containerization

This was a "from zero to one" journey right after graduation.

Major steps:

Design a containerization scheme based on existing business.

Add a bastion host (Jumpserver).

Create front‑end and back‑end service images.

Set up test, pre‑release, and revamped production Kubernetes clusters.

Implement multi‑cluster CI/CD using GitLab‑Runner, GitLab, and Kustomize.

Define log fields and output formats with teammates.

Assist the back‑end team in fine‑tuning legacy bare‑metal code.

Use Rancher for unified multi‑cluster management.

Automate certificate issuance and renewal with Cert‑Manager.

Write Shell scripts to check GitLab backups, bare‑metal service backups, and certificate expiry.

Unified Log Management Platform

This project was my biggest personal achievement in the past year.

Implementation ideas: ensure namespace uniqueness across clusters, keep Elasticsearch, Kibana, Logstash, and Kafka outside the clusters and shared, deploy Filebeat, Metricbeat, and kube‑state‑metrics in each cluster, standardize metric tags, output logs in JSON without multiline entries.

Result: unified logging across multiple clusters and environments.

CI/CD

We chose GitLab‑Runner for automated deployment. Repository creation guidelines are available online.

Workflow: developers push code to environment‑specific branches (production branch requires manager merge) → image build on a designated pre‑release node → pod deployment according to .gitlab-ci.yml rules.

Separate environments by branch.

Build images on a single pre‑release node to avoid production impact and enable image reuse.

Leverage built‑in variables and custom scripts to increase reusability of Kubernetes manifests.

Service Governance

As services became micro‑service‑oriented, managing east‑west traffic became critical. Kong’s A/B testing (paid feature) was needed, prompting deeper service‑governance work.

Key aspects used:

Load balancing: least‑connections for base services, consistent‑hash for business services.

Health checks: e.g., remove after 30 s of inactivity, three errors within 10 s, check interval 10 s.

Connection pool: max 10 requests per instance, each connection handles 2 requests then closes, 3 retries, 500 ms timeout.

Circuit breaking based on health checks and connection pool.

Retry policy: up to 3 retries, 2 s timeout per call.

Rate limiting: to be applied when user count grows.

Link tracing.

Innovation: integrated authentication service with Istio via EnvoyFilter and Lua.

Private Deployment

Our flagship product is a 3D editor with strict data confidentiality, requiring several private‑cloud deployments.

Learnings:

Business: identify required services and plan routing.

Cluster: estimate resource needs based on customer requirements.

Communication: translate technical details for non‑technical customers and internal ops.

Timeline: plan preparation, deployment, testing, and delivery phases.

Coordination: involve back‑end engineers for configuration issues.

Private deployment tests deep familiarity with business and clusters, challenging an ops engineer’s skill set.

Summary

I view IT as a training ground; after nearly a year since graduation I have passed the entry, venturing, cultivation, and now realization stages.

There is still much to learn; technology has no end, and the cloud‑native path remains full of opportunities that require perseverance.

Outlook

Realization: deepen cognition and mindset.

Exploration: venture beyond the familiar.

Continue expanding cloud‑native expertise around Kubernetes, contribute to open source, and accelerate secondary development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud Nativeci/cdDevOpslogging
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.