Cloud Native 13 min read

From Rookie to Cloud‑Native Architect: Building an Enterprise Kubernetes Cluster

Over the past year, the author chronicles a hands‑on journey from a fresh graduate to a cloud‑native specialist, detailing the design and implementation of an enterprise‑grade Kubernetes architecture—including multi‑cluster logging, CI/CD pipelines, Istio service mesh, monitoring, and private‑deployment strategies—while sharing practical lessons learned.

ITFLY8 Architecture Home

Sep 5, 2021

From Rookie to Cloud‑Native Architect: Building an Enterprise Kubernetes Cluster

Preface

IT is a dojo!

After graduating in May 2020, I joined my first company under the mentorship of the CTO (the Alibaba Cloud MVP). I learned a great deal, and after three months the mentor left, giving me the chance to forge my own path.

From September 2020 I entered three phases: "venturing" (loneliness, pain, and perseverance), "cultivation" (self‑encouragement), and "realization" (cognitive and philosophical growth). All technical work has been self‑driven.

In July 2019 I decided to pursue a cloud‑native career, falling in love with Kubernetes since August 2018.

In early June 2020 I experienced the chaos of a startup’s cluster environment: only front‑end services were Kubernetes‑ized, production resources were scarce, GitLab ran inside Kubernetes, and permissions were tangled. I proposed several solutions (see my blog post).

June 2020: built an ELFK‑based logging system that collected only gateway (nginx‑ingress) logs.

July 2020: led a full business Kubernetes migration, taking the service from zero to one.

August‑September 2020: rebuilt clusters and CI/CD, added test and pre‑release environments, switched the gateway from nginx‑ingress to kong‑ingress, extracted GitLab from Kubernetes, automated certificate issuance with cert‑manager, introduced a bastion host for permission cleanup, and used gitlab‑runner for multi‑cluster deployments.

October 2020: focused on a monitoring and alerting system covering three dimensions.

November 2020: centered on ISTIO service governance, validating connectivity, security, flow control, and observability in a test environment, and developed an envoyfilter plugin for authentication.

December 2020‑January 2021: built a unified logging platform for multiple Kubernetes clusters and bare‑metal services.

January‑February 2021: migrated the pre‑release kong‑ingress to Istio, integrating certificate services, monitoring, and logging.

March‑May 2021: worked on private‑cloud deployment, Istio production rollout, and related tasks.

During the year I created 13 code repositories and wrote over 130 technical documents.

In early June 2020 I drafted an enterprise‑grade Kubernetes cluster architecture with three environments (production, pre‑release, test) and added boundary services such as a unified log platform, monitoring, tracing, management console, automatic certificate renewal, and flow control.

Key Parts of Enterprise Kubernetes Architecture

Rebuilding Cluster Architecture, Full Business Containerization

The reconstruction followed these steps:

Design a containerization plan based on existing business. Add a bastion host (Jumpserver). Create front‑end and back‑end service images. Set up test, pre‑release, and revamped production Kubernetes clusters. Implement multi‑cluster CI/CD using GitLab‑Runner, GitLab, and Kustomize. Define log fields and output formats with the team. Assist the back‑end team in fine‑tuning bare‑metal services. Use Rancher for unified multi‑cluster management. Automate domain certificate issuance and renewal with Cert‑Manager. Write shell scripts to check GitLab backups, bare‑metal service backups, and domain expiration.

Unified Log Management Platform

This was the author’s biggest conceptual achievement.

Key ideas: namespaces must be unique across clusters; Elasticsearch, Kibana, Logstash, and Kafka run outside the clusters and are shared; each cluster runs Filebeat, Metricbeat, and kube‑state‑metrics; logs are JSON‑formatted and single‑line.

Achieved unified logging for multiple clusters and environments.

CI|CD

The automation tool chosen was gitlab-runner. Repository standards are documented in my blog.

Workflow: developers push code to environment‑specific branches (production branch requires manager merge) → image build on a designated node in the pre‑release cluster → deployment according to .gitlab-ci.yml rules.

Separate branches per environment. Image builds run on a single pre‑release node, reducing production impact and enabling image reuse. Leverage built‑in variables and scripts to make Kubernetes manifests reusable.

Monitoring and Alerting System

Implemented monitoring across three dimensions: business, application, and operating system.

Business monitoring tracks custom metrics such as growth rate and error rate, requiring predefined standards and instrumentation.

Application monitoring uses probes (external health checks) and introspection (internal status, transactions, performance) to feed events, logs, and metrics to the monitoring stack.

OS monitoring watches resource usage and saturation (CPU usage, load, etc.).

Three monitoring dimensions. Includes bare‑metal servers. Includes Windows hosts.

Service Governance

As micro‑services grew, managing east‑west traffic became critical. Kong’s paid A/B testing feature was needed, prompting a shift toward service mesh.

Key usage points:

Load balancing: least‑connections for basic services, consistent hashing for business services. Health checks with custom thresholds. Connection pools limiting requests per instance. Circuit‑breaker based on health checks and pool settings. Retry policy: up to three attempts, 2‑second timeout. Rate‑limiting planned for future scaling. Distributed tracing.

Implemented authentication integration via an envoyfilter and Lua scripts with Istio.

Private Deployment

Our main product is a 3D editor with strict data confidentiality, requiring on‑premises deployments for large enterprises.

Key lessons learned:

Understanding customer‑required services and routing. Estimating resource needs per cluster. Communicating technical details to non‑technical stakeholders. Planning timelines for preparation, deployment, testing, and delivery. Coordinating with back‑end engineers for configuration issues. Private deployment tests deep familiarity with business and clusters, reflecting an operations engineer’s skill set.

Conclusion

IT is a dojo; one must continuously train. In the first year after graduation, the author progressed through entry, venture, cultivation, and now realization phases.

There is still much to learn; cloud‑native technology is an endless journey that requires perseverance and curiosity.

Outlook

Future focus: deepen cloud‑native expertise around Kubernetes, contribute to open‑source, and accelerate secondary development.

Source: https://www.cnblogs.com/zisefeizhu/p/14601287.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring cloud-native CI/CD Kubernetes Logging Service Mesh

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.