How Medium Scales Microservices with Kubernetes: Architecture, Tools, and Tuning
Medium explains why it chose Kubernetes for microservice management, describes its multi‑cluster deployment across four availability zones, details configuration tooling with Terraform, and shares scaling optimizations using a cluster over‑provisioner and pod preemption to achieve smoother node utilization.
Why Choose Kubernetes?
Medium adopted Kubernetes because it naturally satisfies their needs for scaling, bin‑packing, and self‑healing, while also simplifying rollout and rollback of deployments, which are critical for their complex architecture.
How Medium Uses Kubernetes
The production environment spans four availability zones, each running an independent Kubernetes cluster. Although Kubernetes now supports topology management within a single cluster, Medium has not yet explored that feature.
Benefits of a Multi‑Cluster Setup
Cross‑zone traffic can be shifted when an availability zone experiences issues, providing resilience.
Infrastructure changes can be rolled out gradually by moving most production traffic to three clusters while testing changes on the fourth.
Medium employs Istio as its service mesh and internal controllers to manage ingress and egress gateways, ensuring consistent configuration across all clusters.
Configuration and Management
Terraform combined with an internal tool is the primary way Medium templates, renders, and applies configurations for all clusters, both production and staging. This single source‑of‑truth approach streamlines testing and applying cluster changes.
Scaling Optimizations for Burst Traffic
Medium invests heavily in right‑sizing resource requests based on actual utilization, improving bin‑packing efficiency. To further smooth scaling, they use a Cluster Over‑Provisioner and Pod Preemption (see https://github.com/deliveryhero/helm-charts/tree/master/stable/cluster-overprovisioner).
For a service they call backend‑A, which may need up to 200 extra pods during traffic spikes, they configure the over‑provisioner to reserve enough CPU and memory across all four clusters. They set the over‑provisioner replica count to 50 per cluster and enable priority preemption together with the cluster‑autoscaler.
The over‑provisioner always keeps resources for 200 additional backend‑A pods.
When a new backend‑A pod is scheduled, the over‑provisioner pod is preempted (evicted) to free resources.
The evicted over‑provisioner pod triggers the cluster‑autoscaler, which adds a new node to accommodate the pending pod.
This mechanism absorbs node‑scaling latency, allowing production services to scale without interruption.
After applying over‑provisioning and right‑sizing, the total node count across the four clusters dropped from frequent peaks of 800‑900 nodes to a stable range of 400‑600 nodes, as illustrated by the accompanying charts.
Conclusion
Kubernetes brings immense complexity but also limitless configurability. Medium is proud of shaping Kubernetes to its needs while continuously exploring new techniques to boost reliability and scalability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
