What We Learned After 3 Years of Running Kubernetes at Scale
After three years of operating a multi‑data‑center Kubernetes platform for millions of devices, we share hard‑won lessons on Java container compatibility, upgrade strategies, build‑pipeline redesign, probe tuning, and external IP handling that can guide any large‑scale cloud‑native deployment.
1. Strange Cases with Java Applications
Engineers often avoided Java in microservices and containers because of its notorious memory management, but recent improvements in container compatibility have changed that perception. Many systems such as Apache Kafka and Elasticsearch still run on Java.
In 2017‑2018 we ran several applications on Java 8 that struggled with Docker‑style environments, crashing due to heap memory issues and erratic garbage‑collection patterns. The root cause was the JVM's inability to use Linux cgroup and namespace, which are essential for containerization.
Oracle has since added experimental JVM flags like XX:+UnlockExperimentalVMOptions and XX:+UseCGroupMemoryLimitForHeap to improve compatibility, but Java still lags behind Python or Go in memory footprint and start‑up speed because of JVM overhead.
When using Java, we now require version 11 or higher and set Kubernetes memory limits at least 1 GB larger than the JVM maximum heap ( -Xmx) to provide a safety margin.
2. Kubernetes Lifecycle Management: Upgrades
Upgrading a Kubernetes cluster—especially one built on bare‑metal or virtual machines—is cumbersome. The simplest approach we found is to provision a fresh cluster with the latest version and migrate workloads, rather than attempting in‑place node upgrades.
Kubernetes consists of many moving components (Docker, CNI plugins such as Calico or Flannel, etc.) that must be upgraded in lockstep. Tools like Kubespray, Kubeone, Kops, and Kubeaws help, but each has trade‑offs.
Using Kubespray on RHEL VMs gave us playbooks for creating, adding, removing nodes, and upgrading versions, but the upgrade playbooks require stepping through every intermediate version, which can be time‑consuming.
The key takeaway is to plan for lifecycle activities early; building and running clusters is relatively easy, but ongoing maintenance introduces a whole new set of challenges.
3. Build and Deployment
Redesigning our CI/CD pipeline for Kubernetes required extensive refactoring of Jenkins pipelines and adoption of new tools such as Helm. We introduced a new Git flow to version‑control application code, Helm charts, Docker images, and deployment manifests.
Application code and its Helm chart live in separate Git repositories, enabling independent semantic versioning.
Chart versions are tied to application versions; for example, app-1.2.0 is deployed with charts-1.1.0. If only Helm values change, we bump the chart patch version (e.g., 1.1.0 → 1.1.1). Release notes are stored in a RELEASE.txt file in each repo.
For third‑party systems like Apache Kafka or Redis that we do not build, we do not maintain a separate Git repo; Docker tags become part of the Helm chart versioning, and changing a Docker tag triggers a major chart version bump.
4. Liveness and Readiness Probes (Double‑Edged Sword)
Kubernetes liveness and readiness probes automatically restart failing containers and route traffic away from unhealthy pods. However, for stateful workloads such as message platforms or databases, aggressive probes can hinder startup and recovery.
Our Kafka cluster (3 brokers + 3 Zookeeper nodes) suffered when a long index‑repair process (10‑30 minutes) caused the liveness probe to repeatedly kill the pod, preventing it from completing recovery.
The usual mitigation is to increase initialDelaySeconds so the probe starts later, but setting it too high slows down overall resilience because Kubernetes waits longer before detecting genuine failures.
Update: Newer Kubernetes versions introduce a third probe type called “startup probe” (alpha in 1.16, beta in 1.18) that disables liveness and readiness checks until the container signals it has started, preventing the issue described above.
5. Exposing External IP
Using static external IPs to expose services imposes a heavy load on the kernel’s connection‑tracking mechanism. Without careful planning, this can break scalability.
Our clusters run Calico for CNI with BGP routing and IP‑tables mode for kube‑proxy. Exposing millions of connections via external IPs forces Kubernetes to track each flow using the kernel’s conntrack and netfilter tables, which have hard limits.
When the conntrack table reaches its maximum, the OS stops accepting new connections. On RHEL we can monitor the limits with:
$ sysctl net.netfilter.nf_conntrack_count net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_count = 167012
net.netfilter.nf_conntrack_max = 262144One mitigation is to peer edge routers with multiple nodes, distributing incoming connections across the cluster and enlarging the effective conntrack capacity.
Reflection
Three years later we still discover new challenges in operating Kubernetes. It is a complex platform that reshapes design, architecture, and team skills, and it demands significant investment to scale and maintain.
If you can run Kubernetes as a managed service in the cloud, many operational burdens—such as CIDR expansion or version upgrades—are alleviated.
Before committing, ask yourself whether you truly need Kubernetes; the answer will help you evaluate the necessity of the platform for your use case and justify its cost.
Remember, technology for technology's sake is meaningless.
Source: https://www.infoq.cn/article/XFN54h7ctSX0O59VVkfi
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
